Category: Azure Pack



Hello there! As the title hints, this post is about solving false alerts being generated in SCOM for non-existing clustered VMs / resources.

I have recently come across a situation where in SCOM I see lot of false alerts generated for hyper-v 2012 r2 clustered resources; reporting VM resource groups are in critical state. However the VMs are deleted from cluster, so the cluster resource monitoring MP should monitor only what actual resources exist on the cluster. “Alert monitor” generated alerts for deleted cluster resources must be closed manually in SCOM as the monitor keeps checking about the non-existing resource to see it state change update., and even after doing this, the alerts for deleted VMs keeps coming back in console.

The whole problem started when couple of cluster nodes in hyper-v cluster are set in Maintenance mode for some activity and the hosts were shutdown as part of process, and during this period, couple of VMs were deleted using VMM management server, and those VMs were gone from cluster as expected, however SCOM picked data from online cluster nodes and was not able to pick data about the deleted VMs from offline cluster nodes. When the shutdown hyper-v hosts were brought online, SCOM started behaving weird, it is still thinking deleted VMs are with the shutdown hyper-v hosts and generated lot of false alerts for deleted VMs in SCOM reporting VMs are in critical state.

At this point, the data with the SCOM in its database is inconsistent. There is no way to remove a clustered resource from cluster management pack view dashboard. We can only place the resource group in MM.

To solve this bug / data inconsistent behavior with SCOM, cluster monitoring Management packs must be deleted and we have to import the cluster monitoring management packs again – this needs to be done when all cluster node are brought back online and active in the cluster, so MPs can pick the data from all Hyper-v nodes. 

Any custom management packs created depending on the cluster MPs needs to be exported from Administration view of SCOM and after deleting all Cluster MPs, and re-importing MPs back in SCOM along with custom MPs will fix this issue. It will take about 1 hour or more to pick / update status of clusters.


Test failover function in Azure portal isn’t working as expected – this is a known issue and Microsoft is actively working towards fixing the bug.

Until then you may use the following link as a workaround https://aka.ms/e2e-ie-tempfix

 

 

 

 

 

 

 

 

 

Using workaround link:


As the title says, this post will list out the common installation errors we might encounter when deploying Azure Stack for POC purposes.

Time Zone issue: When deploying the Azure stack, the first run of powershell script will ask you to input Computer Name, IP, DNS and Time zone configurations. In case if the time zone settings are left as is / configured incorrectly – means Azure stack host Operating system Time zone is different, then Post DC VM installation, the script will fail because the time sync will have problems and authentication of Azure stack host with the DC to join it to the deployed Azure stack DC will have issues. – Authentication related error will throw.

 

Access Denied on Stack VMs from Host powershell session: This is because when the VMs are deployed using powershell and are rebooted as part of configurations, there are chances that required  Ops-Administrators Group doesn’t get added to the VMs. In this case we will have to login/remote to the concerned VM and validate the local administrators group.

 

 
How to add Ops-Admins to the Local Admins group is listed below:

 

The reason it fails to access the VMs remotely from the powershell session we do -rerun command is that we login to the Azure stack host as FabricAdministrator and this same account doesn’t have access to the VMs because of some skipped/misconfiguration.

VM Installation failed with the unexpected restart error: There can be multiple reasons why a VM fail at this step. Primarily it hints about underlying hardware specs and mounted ISO image integrity as well – since Azure stack powershell deploys VM using VHD it has pre-built, mostly due to underlying storage IOPS issues, the VM might fail to boot within time/ crash during installation.  If you see any VM failed to boot and Script fails to continue as it needs VM be up and remotely accessible via PS, then You can delete the VM from FCM / Hyper-V and -rerun the stack deployment.

 
 
Other issues we might commonly encounter is due to Resources availability and Time out due to VMs failing to respond / boot.

Main limitation in deploying Azure Stack POC is mostly due to Storage. As all the VMs run off of same Disk, available throughput, IOPS for all the VMs considerably falls low. Spreading disks on multiple HDDs for S2D pool will increase the performance. Another feasible option is to use SSHDs – if budget allows :).

In Azure stack POC publicly released, Nuget Packages are little modified. Cloud deployment nuget package now sectioned configuration of VMs individually, that means we have to define the Memory, CPU and Dynamic Memory configs in the OneNodeRole.xml configuration file of each individual VM roles.

As we can see from the above config file, Memory, CPU and Dynamic Memory settings must be defined for each Azure Stack VM individually in the “CloudDeployment.1.0.597.18.nupkg” file. The exact path to access the individual VMs to configure can be seen from the Rar file screenshot above – “\Content\Configuration\Roles\Fabric\*”  If this needs a detailed post, please send your requests, so as per requirements I will write a new blog post to deal with this.

%d bloggers like this: