Azure HCI unable to reinstall aks service (from previous corrupted aks)

Christopher Low Kin Siong 47 Reputation points Microsoft Vendor
2021-08-11T10:09:38.07+00:00

due to some corruption with the aks service or nodes (unable to run get-akshcibillingstatus).

I intend to cleanly redo the aks installation
a) all servers are patched to 10.0.20348 with latest august updates
uninstall-akshci was run on all nodes with no errors.

b) using the gui, my first attempts fail after 10 minutes 18 seconds install-akshci, the operation has timed out. clean up your host environment and re-start the setup process.

c) I reran uninstall-akshci

I tried powershell
update-module az.accounts -RequiredVersion 2.5.1
Install-Module -Name AksHci -Repository PSGallery -force -acceptlicense
Import-Module Az.Accounts
Import-Module Az.Resources
Import-Module AzureAD
Import-Module AksHci
Connect-AzAccount -devicecode
WARNING: To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code xxx
to authenticate.

WARNING: TenantId '295be6d3-mytenantid' contains more than one active subscription. First one will be
selected for further use. To select another subscription, use Set-AzContext.

Account SubscriptionName TenantId Environment


admin@mydomain.onmicrosoft.com mysub 295be6d3-mytenantid AzureCloud

Set-AzContext -Subscription "50fb2758-mysubscription"

Name Account SubscriptionName Environment TenantId
---- ------- ---------------- ----------- -------- mysub (50fb2758-... admin@mydomain.o... mysub AzureCloud 295be6d3-5142-4... Register-AzResourceProvider -ProviderNamespace Microsoft.Kubernetes ProviderNamespace : Microsoft.Kubernetes RegistrationState : Registered ResourceTypes : {connectedClusters, locations, locations/operationStatuses, registeredSubscriptions...} Locations : {West Europe, East US, West Central US, South Central US...}

Register-AzResourceProvider -ProviderNamespace Microsoft.KubernetesConfiguration

ProviderNamespace : Microsoft.KubernetesConfiguration
RegistrationState : Registered
ResourceTypes : {sourceControlConfigurations, extensions, operations}
Locations : {East US, West Europe, West Central US, West US 2...}

PS C:\Users\mylogon>
Set-AksHciRegistration -subscriptionid "50fb2758-mysubscription" -tenantid "295be6d3-mytenantid" -resourcegroupname DellAzureHCISEA -Region SouthEastAsia
Set-AksHciRegistration : Cannot bind argument to parameter 'version' because it is an empty string.
At line:1 char:1

  • Set-AksHciRegistration -subscriptionid "50fb2758-mysubscriptionid ...
  • ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  • CategoryInfo : InvalidData: (:) [Set-AksHciRegistration], ParameterBindingValidationException
  • FullyQualifiedErrorId : ParameterArgumentValidationErrorEmptyStringNotAllowed,Set-AksHciRegistration

d) I try gui again. (without uninstall-akshci from the powershell attempt)

Failed with errors
Install-AksHci - Importing Configuration Completed
Duration: 0 minutes 3 seconds
[Install-AksHci]:The operation has timed out.

Azure Stack HCI
Azure Stack HCI
A hyperconverged infrastructure operating system delivered as an Azure service that provides security, performance, and feature updates.
356 questions
0 comments No comments
{count} votes

Accepted answer
  1. MattMcSpirit-MSFT 561 Reputation points
    2021-08-11T16:10:03.967+00:00

    You can collect the logs using Get-AksHciLogs, however in the current state, I'm not sure what it will return, but worth a try.

    You shouldn't typically need to run Uninstall-AksHci on all physical nodes, just running on one node is usually sufficient but in this situation, running on every node just to be sure if probably a good idea. Here's what i've provided to others in the past:

    Firstly, run these:
    Uninstall-AksHci
    Uninstall-Moc
    Uninstall-AksHci

    Then:

    In the registry delete
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\AksHciPS
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MocPS
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\KvaPS

    Delete the PS modules from the WindowsPowerShell\Modules folder on each node.

    Delete the following folders and subfolders on the host machine:
    C:\AKSHCI
    C:\Program Files\AksHci

    Under the User who installed AKS-HCI there are a few folders as well.
    .AksHci
    d----- 3/17/2021 1:44 PM .kube
    d----- 3/16/2021 2:35 PM .Kva
    d----- 3/16/2021 2:35 PM .Moc
    d----- 3/12/2021 10:19 AM .ssh
    d----- 3/16/2021 2:36 PM .wssd

    Delete all VMs created by AKS-HCI if any are running.
    Delete the cluster object if it hasn’t been cleaned up already.
    Make sure you do that on all physical nodes in the cluster.


1 additional answer

Sort by: Most helpful
  1. Trent Helms - MSFT 2,541 Reputation points Microsoft Employee
    2021-08-11T12:16:49.053+00:00

    Hi @Anonymous ,

    As I understand, the Uninstall-AksHci cmdlet should be run on directly on each of the cluster nodes and should be cleaning up the environment correctly. However, if this fails, you can manually perform a cleanup by doing the following:

    On each node of the cluster:

    • Remove wssdcloudagent service
    • Remove wssdagent service
    • Remove folder C:\Program Files\AksHci
    • Remove all VMs that are created from this process

    On the cluster:

    • Run ‘get-clustergroup’. If you have a clustergroup with a name of format ‘ca-guid’ or any that include the name ‘management cluster’, run Remove-ClusterGroup on that cluster group.

    After this, restart WAC and attempt your setup once again.

    Thanks so much,
    Trent


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.