Cluster | Chaladi's Blog

Category: Cluster

Solving SCOM false alerts for deleted cluster VM resources

Filed under: Azure Pack, Cluster, Failover Clustering, SCOM, SCVMM, System Center, Windows, Windows Server 2012, Windows Server 2012 R2 — Leave a comment

August 22, 2018

Hello there! As the title hints, this post is about solving false alerts being generated in SCOM for non-existing clustered VMs / resources.

I have recently come across a situation where in SCOM I see lot of false alerts generated for hyper-v 2012 r2 clustered resources; reporting VM resource groups are in critical state. However the VMs are deleted from cluster, so the cluster resource monitoring MP should monitor only what actual resources exist on the cluster. “Alert monitor” generated alerts for deleted cluster resources must be closed manually in SCOM as the monitor keeps checking about the non-existing resource to see it state change update., and even after doing this, the alerts for deleted VMs keeps coming back in console.

The whole problem started when couple of cluster nodes in hyper-v cluster are set in Maintenance mode for some activity and the hosts were shutdown as part of process, and during this period, couple of VMs were deleted using VMM management server, and those VMs were gone from cluster as expected, however SCOM picked data from online cluster nodes and was not able to pick data about the deleted VMs from offline cluster nodes. When the shutdown hyper-v hosts were brought online, SCOM started behaving weird, it is still thinking deleted VMs are with the shutdown hyper-v hosts and generated lot of false alerts for deleted VMs in SCOM reporting VMs are in critical state.

At this point, the data with the SCOM in its database is inconsistent. There is no way to remove a clustered resource from cluster management pack view dashboard. We can only place the resource group in MM.

To solve this bug / data inconsistent behavior with SCOM, cluster monitoring Management packs must be deleted and we have to import the cluster monitoring management packs again – this needs to be done when all cluster node are brought back online and active in the cluster, so MPs can pick the data from all Hyper-v nodes.

Any custom management packs created depending on the cluster MPs needs to be exported from Administration view of SCOM and after deleting all Cluster MPs, and re-importing MPs back in SCOM along with custom MPs will fix this issue. It will take about 1 hour or more to pick / update status of clusters.

Tags: deleted cluster vms showing in scom, false alerts in scom, false cluster resource group offline alerts in scom, SCOM, scom 2012r2, scom data inconsistent, scom generating false alerts, scom management packs, scom not showing proper data, stale cluster resource in scom, stale datain scom, system center operations manager, vmm and scom integration

Comment

#Microsoft – Limitations in Windows Server Failover Clustering

Filed under: Cluster, Failover Clustering, Windows, Windows Server, Windows Server 2012, Windows Server 2012 R2 — Leave a comment

February 22, 2016

Howdy! Today’s blog post is all about Microsoft’s Windows Server Failover Clustering. I’ve noticed that there are a couple of limitations in Windows Server Failover Clustering (WSFC). I am gonna keep adding the identified limitations, so keep checking.

First of all Shared VHDX issue. Shared VHDX is a clustering storage feature introduced in 2012R2 for windows server cluster participating nodes. If you’re wondering what is Shared VHDX and how it works, Please see here

So, now say, in a 2 node Shared VHDX cluster, you attach 4 Disks to the cluster resource, that is SQL considered as an example here, and both the cluster nodes have these 4 disks in Shared VHDX mode. This will help present the storage as Shared storage, so both the nodes in the cluster can see them. Now if I wanted to move the SQL form Node A to Node B, then all the Shared VHDXs on SQL owning Node will go to Reserve state and will come online on Node B, since we have moved the SQL to node B; and eventually SQL associated Disks and components will move.

Now, if for some reason, one of shared disks are not presented to Node B via Hyper-V manager settings, then Failing over the SQL to Node B will fail to move to Node B. The only error you get is “Cluster disk not connected”. And generating the cluster logs via powershell using “get-clusterlog -uselocatime -timespan 5 -destination D:\logs” too results the below logs.

“ERR [RCM] rcm::RcmApi::MoveGroup: ERROR_CLUSTER_DISK_NOT_CONNECTED(5963)’ because of ‘Move of group SQL Server (MSSQLSERVER) to node CLUSTERNODE2 is not approved’”

Now the limitation I’m talking about here is, the cluster is not helping out you identify which exact Shared VHDX is not visible to the Node B. So, if Disk 2 is not presented to Node B, then the cluster knows in the background that it is failing to bring the cluster Disk 2 on Node B, so it should log all that “Bringing Disk 1 online on Node X — Pass, Bringing Disk 2 Online on Node X…” like so, that will help you identify the missing Shared VHDX on the Nodes.

In the above command I’ve used timespan of 5 minutes to pull logs regarding cluster. This avoid me generating a big file and to read all the unwanted stuff, since I’ve just tried to move the SQL off of Node A within the last 5 minutes.

Now, you may feel that you can use Disk management to see the Disks differences, but it works if you have few disks and they all represent different data sizes. If you have 15 or like Storage disks presented via Hyper-V and then almost all are same size, like 500 GB in sizes, then it would be kinda time waste to go through all those disk numbers comparing the disks on each nodes side by side.

Now, when I say limitation in Shared VHDX perspective, it could also apply to SAN storage presented via EMC powerpath or like that to the Cluster Nodes. But in that SAN storage directly presented to Cluster node, we can use Powerpath console to identify the disks missing using the reference naming convention used to label disks pushed to the cluster nodes while zoning. But it is still I feel a limitation exists in Windows clustering that is much needed to address at earliest.

And here, with Shared VHDX there’s a big issue with the Redirected I/Os that will kill your critical applications because of poor disk performance. A heavy disk utilising cluster resource must not use Shared VHDx as its storage for this reason. I will write more about this Redirected IOs issue in a separate post.

Tags: cluster debug logs, failover, fcm, how to generate cluster logs, hyper v, hyper v clustering, Hyper-v cluster, limitations in windows server clustering, move failure in windows lcuster, san storage issue, shared VHDX, storage issue, storage not connected issue, windows server failover clustering, wsfc

Comment

[Solved] – [sqsrvres] Failed to start service with error 1062. Please try again

Filed under: Cluster, Windows, Windows Server, Windows Server 2008, Windows server 2008 R2, Windows Server 2012 — Leave a comment

January 10, 2016

SQL cluster resources may be failing to start in the cluster with no specific error thrown when you are trying to start the SQL Service from the cluster window. If you Generate the cluster logs or in the Event viewer cluster logs you may see this annoying “[RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Failed to start service with error 1062. Please try again”

This error doesn’t really give you real clue what’s wrong with the SQL service. You may have to go to Application/System Event logs to find the real cause. The following error will be displayed in the logs section “Unable to allocate enough memory to start ‘SQL OS Boot’. Reduce non-essential memory load or increase system memory.”

This means that there’s not enough memory available on the Cluster node to start the SQL services. You can either failover the SQL service/other concerned service to other participating node or increase the Memory of Cluster node if memory is being fully utilised.

Cluster Logs reads as below:

000011cc.00000568::2015/12/28-04:59:49.866 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Dependency expression for resource ‘SQL Network Name (XYZ_NAME)’ is ‘([9876bf5f-f99d-4de9-84dd-1c286559d994])’

000011cc.00000568::2015/12/28-04:59:49.871 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Starting service MSSQL$DTA…

00000a9c.00001be4::2015/12/28-04:59:50.164 INFO [NM] Received request from client address CLUSTERNODE_1.

000011cc.00000568::2015/12/28-04:59:51.150 ERR [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Failed to start service with error 1062. Please try again

000011cc.00000568::2015/12/28-04:59:51.150 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] SQL Server resource state is changed from ‘ClusterResourceOnlinePending’ to ClusterResourceFailed’

000011cc.00000568::2015/12/28-04:59:51.150 ERR [RHS] Online for resource SQL Server (DTA) failed.

00000a9c.00001778::2015/12/28-04:59:51.150 WARN [RCM] HandleMonitorReply: ONLINERESOURCE for ‘SQL Server (DTA)’, gen(1) result 5018/0.

000011cc.00000568::2015/12/28-04:59:51.150 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Extended Event logging is stopped

00000a9c.00001778::2015/12/28-04:59:51.150 INFO [RCM] Res SQL Server (DTA): OnlinePending -> ProcessingFailure( StateUnknown )

00000a9c.00001778::2015/12/28-04:59:51.150 INFO [RCM] TransitionToState(SQL Server (DTA)) OnlinePending–>ProcessingFailure.

If the cluster nodes are VMs and you have Dynamic Memory configured on these VMs, then Live migrate the VM to a more capable VM Host to fix the Dynamic Memory not being allocated to the VMs by the cluster.

Any questions, please feel free to hit the comment section.

Tags: 2012, cannot start the SQL service in the cluster, sql cluster, SQL OS Boot, SQL service is not able to start in windows cluster, sqsrvres 1062 error, Unable to allocate enough memory to start 'SQL OS Boot', Unable to allocate enough memory to start 'SQL OS Boot'. Reduce non-essential memory load or increase system memory, vm dynamic memory, vm memory issue, windows 2012 r2, windows cluster, [sqsrvres] Failed to start service with error 1062

Comment

How To Fix – NFS Cluster Resource / 14202 Error in Windows Failover Cluster Manager

Image

Filed under: Cluster, Windows, Windows Server, Windows Server 2008, Windows server 2008 R2, Windows Server 2012 — 2 Comments

July 29, 2015

Hello All,

It’s been so long since I’ve wordpressed. The below short article wrote help to fix the 14202 errors in windows failover cluster – Titled “How-To Fix windows Failover cluster 14202 Event ID error”. Issue deals fixing One of the Cluster resources failed to start with this error in FCM (Failover Cluster Manager). It can be NFS / someother fileshare/disk based resource hosted on FCM.

Upon looking at the Resources of this Failed Cluster resource, you can see that NFS/resource related to this is in failed state.

To see what’s causing the issue with NFS/related resource failures, you have to go to Cluster events to know more about the error cause.

Now you see that NFS-HyperV-FS is not dependent on disk resource G:\shares\NFS-FS, so it is failing to get the cluster resource online as there is no dependencies for the NFS share we’ve configured to use for this NFS cluster resource. NFS file share works when it has dependencies/resources allocated to it.

Go to NFS resource properties as below and create dependency on the drive/share as below:

Now, click on Empty Resource field and drop down the menu there to select Cluster Disk/Disk presented to the NFS/Cluster resource in FCM (if there is only one filed here in Resource, Add One as required; click Insert to add empty fields). In my case Cluster Disk 3 is the one that is hosting the NFS shares. Click AND property to have the resource dependent on Disk explicitly; this means disk and NFS object name should be online to get the resource working (of course, if CNO (Cluster Name Object) alone is online, no use as share drive is offline 🙂 )

So now we’ve set the Cluster DIsk 3/NFS share disk and HyperV-FS; that is computer object name for accessing the NFS. This should solve the dependency problem and help get the cluster resource online. Now try bringing the NFS resource online again from FCM and this should work now 🙂

Please let me know if you have any questions/trouble fixing this kind of errors. You can always comment if you have issues with Failover Clusters in Windows or VMware.

Tags: 14202, event id 14202, failed to bring nfs online, failover cluster manager in windows server 2012 r2, how to fix 14202 error, how to fix NFS resource failed error in windows server, how to fix windows cluster 14202 error, how to setup NFS in windows failover cluster, hyperv cluster error, Netowrk file system resource may not be dependent on disk resource that contain this share path, network file system setup in windows server 2012 r2, NFS in windows cluster, nfs resource failed, nfs resource failed because of dependent disk, nfs setup in windows cluster, windows server 2012 failover cluster error

Comment

Like My Work? Say it on FB

Like My Work? Say it on FB
Email Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Email Address:

Join 259 other subscribers

Calendar
June 2026

M T W T F S S

1 2 3 4 5 6 7

8 9 10 11 12 13 14

15 16 17 18 19 20 21

22 23 24 25 26 27 28

29 30

« Aug
Top Posts & Pages
Cloud
2012 2012r2 active directory alias android Apple bridged mode for vm centos centos in vmware clear process to install centos Cnames Computer file copy viber messages detailed process to install centos DNS eset smart security facebook File locking File Management Filename extension gallery lock pro general settings for vmware guest o.s host name Hostname Hosts hosts file how to how to backup viber messages how to do linux in vmware how to install centos how to restore viber messages in android how to save viber messages how to save viber messages in android Hyper-v cluster ics Ipaddress IPhone ITunes kernel problems Linux linux sestup in vmware Microsoft Microsoft Windows network settings Nsswitch.conf Operating system permanent hostname in centos Personal computer ping Protocols Redhat rooting android save viber messages saving viber contacts setting up hostname setup hostname Shareware step by step to install centos steps to install redhat titanium backu titanium backup Tools Utilities viber virtualisation virtual machines VM vmware vmware usage VMware Workstation Windows windows 7 Windows 8 windows vista
Archives
Archives
Pages
- About Me
- Android
- Linux
- My Life Blog
- Routing & Switching
- Security
- Virtualisation
- Windows
- Hacking

	chaladi on [Solved] How to Fix DHCP serve…
	Toby on How to Enable “Send to B…
	Elena Cole on [Solved] How to Fix DHCP serve…
	Ahsan Ali on [Solved] How to Recover Forgot…
	Ahsan Ali on [Solved] How to Recover Forgot…

Chaladi's Blog

RSS

Category: Cluster

Solving SCOM false alerts for deleted cluster VM resources

#Microsoft – Limitations in Windows Server Failover Clustering

[Solved] – [sqsrvres] Failed to start service with error 1062. Please try again

000011cc.00000568::2015/12/28-04:59:49.866 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Dependency expression for resource ‘SQL Network Name (XYZ_NAME)’ is ‘([9876bf5f-f99d-4de9-84dd-1c286559d994])’

000011cc.00000568::2015/12/28-04:59:49.871 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Starting service MSSQL$DTA…

00000a9c.00001be4::2015/12/28-04:59:50.164 INFO [NM] Received request from client address CLUSTERNODE_1.

000011cc.00000568::2015/12/28-04:59:51.150 ERR [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Failed to start service with error 1062. Please try again

000011cc.00000568::2015/12/28-04:59:51.150 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] SQL Server resource state is changed from ‘ClusterResourceOnlinePending’ to ClusterResourceFailed’

000011cc.00000568::2015/12/28-04:59:51.150 ERR [RHS] Online for resource SQL Server (DTA) failed.

00000a9c.00001778::2015/12/28-04:59:51.150 WARN [RCM] HandleMonitorReply: ONLINERESOURCE for ‘SQL Server (DTA)’, gen(1) result 5018/0.

000011cc.00000568::2015/12/28-04:59:51.150 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Extended Event logging is stopped

00000a9c.00001778::2015/12/28-04:59:51.150 INFO [RCM] Res SQL Server (DTA): OnlinePending -> ProcessingFailure( StateUnknown )

00000a9c.00001778::2015/12/28-04:59:51.150 INFO [RCM] TransitionToState(SQL Server (DTA)) OnlinePending–>ProcessingFailure.

Like My Work? Say it on FB

Email Subscription

Recent Comments

Calendar

Top Posts & Pages

Cloud

Archives

Pages

Friends & links

Pages

Monthly archives

Chaladi's Blog

RSS

Category: Cluster

Solving SCOM false alerts for deleted cluster VM resources

Share this:

#Microsoft – Limitations in Windows Server Failover Clustering

Share this:

[Solved] – [sqsrvres] Failed to start service with error 1062. Please try again

000011cc.00000568::2015/12/28-04:59:49.866 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Dependency expression for resource ‘SQL Network Name (XYZ_NAME)’ is ‘([9876bf5f-f99d-4de9-84dd-1c286559d994])’

000011cc.00000568::2015/12/28-04:59:49.871 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Starting service MSSQL$DTA…

00000a9c.00001be4::2015/12/28-04:59:50.164 INFO [NM] Received request from client address CLUSTERNODE_1.

000011cc.00000568::2015/12/28-04:59:51.150 ERR [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Failed to start service with error 1062. Please try again

000011cc.00000568::2015/12/28-04:59:51.150 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] SQL Server resource state is changed from ‘ClusterResourceOnlinePending’ to ClusterResourceFailed’

000011cc.00000568::2015/12/28-04:59:51.150 ERR [RHS] Online for resource SQL Server (DTA) failed.

00000a9c.00001778::2015/12/28-04:59:51.150 WARN [RCM] HandleMonitorReply: ONLINERESOURCE for ‘SQL Server (DTA)’, gen(1) result 5018/0.

000011cc.00000568::2015/12/28-04:59:51.150 INFO [RES] SQL Server <SQL Server (DTA)>: [sqsrvres] Extended Event logging is stopped

00000a9c.00001778::2015/12/28-04:59:51.150 INFO [RCM] Res SQL Server (DTA): OnlinePending -> ProcessingFailure( StateUnknown )

00000a9c.00001778::2015/12/28-04:59:51.150 INFO [RCM] TransitionToState(SQL Server (DTA)) OnlinePending–>ProcessingFailure.

Share this:

How To Fix – NFS Cluster Resource / 14202 Error in Windows Failover Cluster Manager

Share this:

Like My Work? Say it on FB

Email Subscription

Recent Comments

Calendar

Top Posts & Pages

Cloud

Archives

Pages

Friends & links

Pages

Monthly archives