clustering
141 TopicsUnable to read server queue performance data
Has anyone started seeing this on Windows Server 2016? Unable to read Server Queue performance data from the Server service. The first four bytes (DWORD) of the Data section contains the status code, the second four bytes contains the IOSB.Status and the next four bytes contains the IOSB.Information. We have this on two of our Cluster at the moment. The same two nodes also end up having issues draining and will lock resources. The other 2 nodes are fine as far as I can see33KViews2likes9CommentsBLOG: Windows Server / Azure Local keeps setting Live Migration to 1 - here is why
Affected products: Windows Server 2022, Windows Server 2025 Azure Local 21H2, Azure Local 22H2, Azure Local 23H2 Network ATC Dear Community, I have seen numerous reports from customers running Windows Server 2022 servers or Azure Local (Azure Stack HCI) that Live Migration settings are constantly changed to 1 per Hyper-V Host, as mirrored in PowerShell and Hyper-V Host Settings. The customer previously set the value to 4 via PowerShell, so he could prove it was a different value at a certain time. First, I didn't step into intense research why the configuration altered over time, but the stumbled across it, quite accidently, when fetching all parameters of Get-Cluster. According to an article a LCU back in September 2022 changed the default behaviour and allows to specify the live migrations at cluster level. The new live migration default appears to be 1 at cluster level and this forces to change the values on the Hyper-V nodes to 1 accordingly. In contrast to the commandlet documentation, the value is not 2, which would make more sense. Quite unknown, as not documented in the LCU KB5017381 itself, but only referenced in the documentation for the PowerShell commandlet Get-Cluster. Frankly, none of the aren't areas customers nor partners would check quite regularly to spot any of such relevant feature improvements or changes. "Beginning with the 2022-09 Cumulative Update, you can now configure the number of parallel live migrations within a cluster. For more information, see KB5017381 for Windows Server 2022 and KB5017382 for Azure Stack HCI (Azure Local), version 21H2. (Get-Cluster).MaximumParallelMigrations = 2 The example above sets the cluster property MaximumParallelMigrations to a value of 2, limiting the number of live migrations that a cluster node can participate in. Both existing and new cluster nodes inherit this value of 2 because it's a cluster property. Setting the cluster property overrides any values configured using the Set-VMHost command." Network ATC in Azure Local 22H2+ and Windows Server 2025+: When using Network ATC in Windows Server 2025 and Azure Local, it will set the live migration to 1 per default and enforce this across all cluster nodes. Disregarding the Cluster Settings above or Local Hyper-V Settings. To change the number of live migration you can specify a cluster-wide override in Network ATC. Conclusion: The default values for live migration have been changes. The global cluster setting or Network ATC forcing these down to the Hyper-V hosts based on Windows Server 2022+/ Azure Local nodes and ensure consistency. Previously we thought this would happen after using Windows Admin Center (WAC) when opening the WAC cluster settings, but this was not the initial cause. Finding references: Later the day, as my interest grew about this change I found an official announcement. In agreement to another article, on optimizing live migrations, the default value should be 2, but for some reason at most customers, even on fresh installations and clusters, it is set to 1. TLDR: 1. Stop bothering on changing the Livemigration setting manually or PowerShell or DSC / Policy. 2. Today and in future train your muscle memory to change live migration at cluster level with Get-Cluster, or via Network ATC overrides. These will be forced down quite immediately to all nodes and will be automatically corrected if there is any configuration drift on a node. 3. Check and set the live migration value to 2 as per default and follow these recommendations: Optimizing Hyper-V Live Migrations on an Hyperconverged Infrastructure | Microsoft Community Hub Optimizing your Hyper-V hosts | Microsoft Community Hub 4. You can stop blaming WAC or overeager colleagues for changing the LM settings to undesirable values over and over. Starting with Windows Admin Center (WAC) 2306, you can set the Live Migration Settings at cluster level in Cluster > Settings. Happy Clustering! 😀936Views2likes0CommentsFailover Cluster Manager error when not running as administrator (on a PAW)
I've finally been trying (hard) to use a PAW, where the user I'm signed into the PAW as does NOT have local admin privileges on that machine, but DOES have admin privileges on the servers I'm trying to manage. Most recent hiccup is that Failover Cluster Manager aka cluadmin.msc doesn't seem to work properly if you don't have admin privileges on the machine where you're running it from. Obviously on a PAW your server admin account is NOT supposed to be an admin on the PAW itself, you're just a standard user. The error I get when opening Failover Cluster Manager is as follows: Error The operation has failed. An unexpected error has occurred. Error Code: 0x800702e4 The requested operation requires elevation. [OK] Which is nice. I've never tried to run cluadmin as a non-admin, because historically everyone always just ran everything as a domain admin (right?) so you were an admin on everything. But this is not so in the land of PAW. I've run cluadmin on a different machine where I am a local admin, and it works fine. I do not need to run it elevated to make it work properly, it just works. e.g. open PowerShell, cluadmin <enter>. PowerShell has NOT been opened via "Run as administrator" (aka UAC). I've tried looking for some kind of access denied message via procmon but can't see anything obvious (to my eyes anyway). A different person on a different PAW has the same thing. Is anyone successfully able to run Failover Cluster Manager on a machine where you're just a standard user?1.3KViews1like2CommentsBLOG: "Only 16 nodes per cluster?! - but VMware..." limitations and rightsizing of failover clusters
Greetings community Windows Server Community members! Today I am sharing insights with you on an often discussed matter. Intro This is an exercise on technical limitations and rightsizing of Hyper-V based clusters. The article applies to rules for general Hyper-V based failover clusters, using Windows Server with Shared Storage (SAN), dHCI, (Azure Stack) HCI and the underlying S2D considerations in special. Seriously, I've stopped counting the number of customers telling me about Hyper-V / Storage Spaces Direct / Azure Stack not being scalable. Especially when thinking about Azure Stack HCI, this gives me chuckles. Inspired by a simple question from a Microsoft techcommunity member I thought it is about time to share my experience on this "limitation". Granted it is themed for S2D and Azure Stack HCI and I do see differences for many use cases out there using shared storages (SAN) or Scale Out File server. If you have comments and suggestions, I am all ears. I appreciate your comments and thoughts. This article is something I am writing from the top of my mind so, bear with me if I missed aspects or things are wrong, I will certainly investigate your comments and corrections. Thinking about Cluster Size - I am putting all my eggs in one basket A great classic song, we will look into this further from an IT perspective. As always in IT: IT depends™. The cluster size possible with S2D scales from 1-16 physical or in lab virtual nodes forming one logical cluster. Especially with S2D (Storage Spaces Direct) using Windows Server and Hyper-V or the more advanced adaptive cloud (hybrid cloud) focused Azure Stack HCI. It is using the same technology as a base product on Windows Server with some notable extras in terms of deployment and management. One should consider though that the number of nodes in a Failover Cluster (and with that the number of disks) does not necessarily help to defend physical disk errors. It depends how the Storage Pool deals with fault domains. This is automatic and just sometimes is good to revise and adjust (if you know what you are doing). Considering fault domains Independent from a Storage point of view, running a large cluster also means it is one large fault domain. In case of issues with the failover-cluster, and there are numerous from networking, physical up to “it is always DNS™”, storage issues, configuration issues and changes or drift. Performance impacts Running one large cluster also causes higher performance and bandwidth impacts. No so much when using a shared SAN, one might think, but certainly when using SDS, dHCI, HCI, like Microsoft Storage Spaces Direct. This is especially true for rebuild times of S2D in case of disk failures, replacement or HW especially disk capacity expansions. Costs When considering cost, S2D requires equal disks in a pool and mostly identical hardware within a cluster. Larger cluster could be less efficient and not tightly targeted and HW optimized to use case, especially for general VM or VDI workloads. Lifespan and oopsies with physical disks Granted NVMe, when choosing appropriated TBW / TWPD models, offer a long very lifespan, excellent response times and performance galore, for sequential but especially for random operations and IOPS. Today they are more cost efficient than SSDs. Albeit when one does not follow the advice to patch your OS, FW and drivers regularly you might be hitting sudden outtakes on NVMe, SSDs and HDDs due to code issues in the firmware. This happened sometimes in the past and just recently also affected Samsung NVMe but have been spotted before disasters at scale. Understanding Storage Spaces (Direct) / Storage Pools In Windows Server S2D (always equally include Azure Stack HCI), all physical disks are pooled. In general, there is just one Storage pool available for all servers within a cluster. An exception are Stretched Clusters, something I do not want to go into detail here. If you want to learn more about these, I can recommend you this epic YT series. If you face a problem with your pool, you are facing a problem for all nodes. This is common and likely what happens with other third-party SAN / RAID / SDS systems. No change here, we are all cooking with water. Here is a general overview of this amazing and "free" technology. It requires Windows Server Datacenter licensing, that's all for all bells and whistles of a highly performant and reliable Software defined Storage. It runs best on with NVMe only setups, but allows flexibility, based on the use case. A high level overview for now to explain the relation to the original topic. Storage Spaces Direct (S2D) has been introduced with Windows Server 2016 and uses ReFS 3.1. Currently we are at Windows Server 2025 soon, and ReFS 3.12, which comes with a ton of improvements. Next to S2D there is Storage Spaces, a similar technology, but not forming a shared storage across different servers, so designed for standalone servers, opposing to server clusters. Something you should consider with ReFS for unclustered Hyper-V, Scale-out File Server and Backup Servers, instead of RAID or SAN. When larger doesn’t mean more secure – Storage Resiliency affects also Cluster resilience On both you define your storage policies per Volume / Cluster Shared Volume, likely to LUNs. So, you can dedicate how much of resiliency, Deduplication and performance is required based on the workload that is going to be stored on that volume. Some basic and common policies are Mirror and Nested Mirror. There exist other depending the number of disks / hosts, but these are not all recommended for VM workloads. When using these resiliency methods, especially Mirror, adding several disks (or hosts) exponentially raises the risks of a full data loss on this Volume / CSV in case of unfortunate events. So, choose and plan wisely. Can just recommend doing the RTFM job beforehand, as later changes are possible but require juggling the data and require having space left in the pool (physical disks) for such storage (migration) operations. Sure there are other methods that scale better like Dual parity. Be warned that the diagrams are simplified, and the data is not equally distributed "per disks" as you would expect in traditional RAID but using 256 MB data blocks (slabs) by using an algorithm that care for the balanced placement. It is important to understand this small difference to understand better on the predicable outcome of disk or host failures within the cluster. Not saying the docs are wrong, just the display of it is simplified. Read on more here: S2D - Nested Resiliency S2D - Understanding Storage fault tolerance Speaking of clusters the best effort is starting with 2 or 4 nodes. I would avoid and unequal number of nodes like three nodes or a multiple of it, as they are not very efficient (33%) and expanding on or from these require changing the Storage Policy (e. g. Three Way Mirror). S2D also support single node clusters, with nested mirror. You have heard right. Still satisfactory performance for many use cases, when you do not need full hardware stack resiliency, at a very low footprint and cost. Notable upcoming improvements to Clustering Storage Spaces (Direct) and beyond I trust that Azure Stack HCI will receive some of the recently announced improvements of Windows Server 2025. Be curious what's coming up next in the from Microsoft in regards of storage and storage options. Have a look at this later on. S2D - New Storage Options for Windows Server 2025 (release planned later this year) One large vs one or more smaller use case designed, scaled clusters Again, it is a common question why the limit of 16 nodes while e. g. VMware supports larger clusters. With no further ado, let's talk about how to do it right (imho). You might seek to create smaller clusters and Azure Stack HCI by design makes it easier to do management, RBAC / Security and LCM operations across several clusters. Having more than one Cluster also enables you to leverage different Azure Subscriptions (RBAC, Cost Management / Billing > Controlling). Sizing considerations – Why you do not need the same amount of hardware you had > smaller clusters Proper (physical) CPU sizing Often, when sizing is done, it is not considered about rightsizing the workloads, and rightsizing the CPU in use in a node. Modern CPUs compared to e.g. Sandy Bridge can help to consolidate in an 8:1 physical server. This way, you can easily save quite some complexity and costs for HW, licensing, cooling etc. To understand the efficiency and why you should not expect current pCPUs to be same on new systems, these calculators from Intel and AMD help you to find a right sized CPU for your Windows Server and what to expect on reducing hardware, TCO and environmental impact. That is climate action up to par. You can find the calculator from Intel here. Same exists for AMD. The vCPU to pCPU ration in today’s environments appear much higher than we are used to in previous clusters. Yes, That's true. I often hear VMware / Hyper-V customers being happy with a 2:1 vCPU:pCPU ratio across their workloads. It depends on the use case but often CPU resources are wasted and pCPUs are idling for their money, even at the end of life of a usual hardware cycle. Plan for rightsizing existing workloads before sizing hardware > Save costs Please consider: Storage Deduplication, also included in a more efficient way, in with in-line Deduplication and Compression with Windows Server 2025. Extra points (savings) for keeping all OS on a similar release. Storage Savings and RAM saving, using Windows Server Core for infrastructure VMs. Saving through Dynamic Memory vCPU / RAM reduction per VM Disk Space (too large, fixed disks, or Thin Provisioned Disks that got a relevant amount data deleted and have not been compressed) etc. etc. All this can be based on your monitoring metrics with your existing solution, or Azure Monitoring through Azure Arc. There is an enormous potential for savings and reduction of Cluster Nodes, RAM and Storage when you are interpreting your metrics before the migration to a more efficient platform. VM Assessment As outlined you can rely on your own methods and monitoring for the assessment of your workloads for rightsizing of hardware and VMs. In addition you can leverage Azure Migrate, to do that for you. It does not matter if you finally decide to migrate to Azure or Azure Stack HCI, it can help you with the general assessment using live data during your operation, which gives you good conclusions on right sizing. No matter the target. Consider growth and migrations There is always growth of the business or increased requirements to consider. The Azure Stack HCI Sizing tool helps you here but watch out sometimes there is huge gap of free resources. The tool is not logically perfect. It is math. Also OS migrations cause temporary growth that can surpass 50% of resources. Good news it is getting better with IPU starting with Windows Server 2022 and later. Additionally, services on older VMs are not well designed like Fileserver+ DHCP + RDS + Certificate Auth on a Domain Controller. These scenarios are still around existing and scream to be resolved, at costs of more VMs / resources. Have you heard Hyper-V isn't scalable? Get you own idea, here are the facts for Windows Server / Azure Stack HCI. And often growing with every release. Source: Azure Stack HCI System Requirements Azure Stack HCI and S2D These limitations shall not be confused with the ability of Hyper-V using a general Hypervisor, e.g. not using S2D but attached SAN: General Hyper-V Scalability and Limitations Conclusion You see there is some complexity in the game on the decision and the “limitation” of 16 nodes per cluster. I personally do not see this as a limitation in Windows Server Hyper-V or Azure Stack HCI given all of those aspects. Smaller clusters use case targeted clusters can also ensure flexibility, inherit the motivation for (cost) reductions and right sizing in the first place. No doubt lift and shift is easy, but it is often more expensive than investing some time into assessments, same with on-premises as in the cloud. So why a 16+ node cluster? Hope this helped you to make a decision. Allow me to cite Kenny Lowe, Regional Director and expert for Azure Stack HCI and other topics: “As ever, just because you can, doesn't mean you should.” Looking for more? Here is an amazing 1h video, you should consider watching if this article just fueled your interest into alternatives to classic clusters. VMware to Hyper-V migration options. Full agenda, on-demand, of Windows Server Summit 2024 https://techcommunity.microsoft.com/t5/tech-community-live/windows-server-summit-2024/ev-p/4068971 Thank you for reading, learning, sharing. Do you find this post helpful? Appreciate your thumbs up, feedback, questions. Let me know in the comments below.2.7KViews1like2CommentsFailover Cluster DNS Record For vIP Is Scavenged
I have a physical, two node Windows 2016 failover cluster that is periodically unable to refresh the DNS entry for its virtual IP. There is no rhyme or reason and the only way to remedy the issue is to delete the record and reboot the cluster nodes one at a time. I have been through all of the troubleshooting articles RE: AD permissions and the issue persists. At this point I am done troubleshooting and would like to just create the DNS record statically, is there any reason I should not do this? Thanks584Views1like0Commentswindows 2022 hyper-v issue connection problem network slow preformance TeamsNIC cluster
Hi, Problem is network. Windows 2022 hyper-v 2 interfaces NIC broadcom 10GB, insert 1 GB Conifguration: NIC Teams, Created VMswitch. Network 1GBit. If Received server transfer file to speed is 100 MB/s, but if send from server to another only 20MB/s Thanks for reply.594Views1like0CommentsHyper-V Cluster live migrating Window 11 Guest
Hi all, Having a Hyper-V 2019 cluster using multiple nodes with san storage. I have several Virtual pc's running Windows 10 pro witch i can live migrate without problems. I'm testing a Windows 11 pro to use in the future. The settings are identical to the Win 10 pro version exept voor enabling "TPM enabled" witch i need voor starting Windows 11. The new Virtual PC is running fine exept i am not able to perform a live migration. Window 11 should be supported https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-windows-guest-operating-systems-for-hyper-v-on-windows Is annyone experiencing this? Best regards3.2KViews1like2CommentsWindows Server 2019: The component store has been corrupted. Error 0x80073712
Hi all, I'm trying to install the Server Backup feature on our 2019 Server, but it results in this error: Any ideas on what may have caused this, and how to solve it? The server was installed about 5 months ago, and the installation is basically stock, not much changes made. Thanks in advance.30KViews1like12CommentsS2D 2016 4 node cluster network connection status failed
Hello, Wondered if anyone has come accross where after updating an s2d cluster node with windows update, one of the cluster network connections shows as "failed" and only shows on the node that was updated. However, the node functions ok, I don't see any issue with connectivity. The cluster hosts are configured with; one connection for host connectivity, one for VM, and one for storage. There was a problem after the update, where one of the virtual disks went offline (odd issue), but after sometime, the cluster restarted the service on that host, and the virtual disk came back up. But the status of the network connection on the host shows as failed, even though everything seems to work. Maybe a restart is all that's needed, and this is a false status....? I will restart this host at a later time, and will update. But if anyone know how I can reset status without a reboot, I will try. Thank youSolved1.1KViews1like1Comment