A scalable, fully managed NoSQL database for JSON documents with fast queries and automatic indexing
Hello Nam Vu,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that your Azure Managed Instance for Apache Cassandra deployment failing with InternalServerError.
You are on the right path with what you've done. Follow the steps below as a remediation plan:
STEP 1:
Confirm the subnet is dedicated and correctly delegated Managed Instance for Apache Cassandra datacenters must deploy into dedicated subnets via VNet injection, so keep this subnet exclusive to Cassandra MI. - https://learn.microsoft.com/en-us/Azure/managed-instance-apache-cassandra/create-multi-region-cluster, https://learn.microsoft.com/en-us/azure/managed-instance-apache-cassandra/create-cluster-cli
az network vnet subnet show -g <rg> --vnet-name <vnet> -n <subnet> --query delegations
If delegation isn’t present, apply it (Azure CLI supports --delegations
az network vnet subnet update -g <rg> --vnet-name <vnet> -n <subnet> --delegations Microsoft.DocumentDB/cassandraClusters
STEP 2:
Remove forced tunneling / “0.0.0.0/0” UDRs that break required outbound access Cassandra MI deployment requires internet/outbound connectivity; builds often fail when outbound is restricted or hair-pinned through custom egress paths. - https://learn.microsoft.com/en-us/azure/managed-instance-apache-cassandra/create-cluster-portal, https://learn.microsoft.com/en-us/azure/managed-instance-apache-cassandra/create-cluster-cli
az network route-table route list -g <rg> --route-table-name <rt> --query "[?addressPrefix=='0.0.0.0/0']"
If you must restrict egress, Microsoft recommends using service tags or routing Microsoft prefixes appropriately rather than blocking platform dependencies. - https://learn.microsoft.com/en-us/azure/managed-instance-apache-cassandra/network-rules
STEP 3:
Validate NSG outbound: allow the required HTTPS (443) destinations first Instead of “allow all,” align with the Required outbound network rules (service tags on TCP/443) so provisioning, updates, logging, and identity can succeed. - https://learn.microsoft.com/en-us/azure/managed-instance-apache-cassandra/network-rules, https://learn.microsoft.com/en-us/azure/managed-instance-apache-cassandra/create-cluster-portal
az network nsg rule create -g <rg> --nsg-name <nsg> -n AllowCassandraMIOutbound
--priority 100 --direction Outbound --access Allow --protocol Tcp --destination-port-ranges 443
--destination-address-prefixes Storage AzureKeyVault EventHub AzureMonitor AzureActiveDirectory AzureResourceManager AzureFrontDoor.Firstparty GuestAndHybridManagement ApiManagement
STEP 4:
Verify DNS can resolve Azure control-plane endpoints from the VNet Cassandra MI relies on DNS names for Azure services (load balancers and platform endpoints), so custom DNS must resolve the required FQDNs. - https://learn.microsoft.com/en-us/azure/managed-instance-apache-cassandra/network-rules
nslookup management.azure.com
nslookup login.microsoftonline.com
nslookup packages.microsoft.com
If lookups fail, temporarily switch to Azure-provided DNS or fix forwarding, then retry deployment.
STEP 5:
Retry with a smaller supported SKU to rule out capacity/SKU constraints When testing, start with a smaller SKU from the supported list (for example Standard_E8s_v5) and scale after a successful provision. - https://learn.microsoft.com/en-us/azure/managed-instance-apache-cassandra/create-cluster-cli
az managed-cassandra datacenter create -g <rg> --cluster-name <cluster> -n dc1 </span>
--data-center-location <region> --delegated-subnet-id <subnetId> --node-count 3 --sku Standard_E8s_v5 --disk-capacity 4
STEP 6:
Pull deployment evidence: Activity Log + ARM Deployment operations Use Activity Log to pinpoint the failing control-plane operation (VMSS, networking validation, DNS, or dependency timeouts), then correlate with Resource Group > Deployments > Failed > Operations. - https://docs.azure.cn/en-us/azure-monitor/platform/activity-log, https://learn.microsoft.com/en-us/cli/azure/monitor/activity-log?view=azure-cli-latest
az monitor activity-log list -g <rg> --status Failed --max-events 20 --output table
STEP 7:
Check regional health/capacity signals before repeating the build Confirm there’s no active incident, maintenance, or broad capacity constraint affecting your region using Azure Service Health and the public Azure Status view. Helpful references: Required outbound network rules, Create cluster (CLI), Create cluster (Portal), Azure Service Health.
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.