Hello @Taylor Russell
Great ! That was deep enough but glad you got it!
I will repeat the Problem and the Solution , you may Accept it if you are OK so anyone having similar issue will find this solution as resolved
PROBLEM :
Encountering an issue provisioning a new Azure Databricks workspace, we are using terraform to deploy the workspace and the the workspace configuration is to be set with private networking
.terraform configuration is valid and the workspace attempts to deploy, however even though the terraform apply succeeds when reviewing the workspace post deployment in the azure portal the following error is present:
The workspace 'AZUT-XXXXXXX-DATABRICKS-WS' is in a failed state and cannot be launched. Please review error details in the activity log tab and retry your operation.
After reviewing the activity log we have found the operation "Create Databricks Workspace" in a failed state with the following error:
Failed to prepare subnet 'AZUT-XXXXXXXX-SUB-DATABRICKS-PRIV'. Please try again later. Error details: 'Gateway authentication failed for 'Microsoft.Network'. Diagnostic information: timestamp '20240808T184051Z', tracking id '2164dd04-897e-4951-a464-0b8b5c1bfe03', request correlation id '2164dd04-897e-4951-a464-0b8b5c1bfe03'
SOLUTION
there was an azure policy applied at the subscription scope which checked that we have 5 specific tags added to every resource and resource group within the subscription. This policy was evaluating and returning a failure stating that one of the mandatory tags did not contain a valid value.
Unfortunately the only way we could ascertain this information was to delete the problem Databricks workspace, delete the preconfigured NSG and subnets created within our VNET to be used by Databricks and then attempt to create the workspace manually via the Azure portal. The reason for deletion is when selecting the option to create the workspace in your own VNET the azure portal deployment requires you to define new subnets \ NSG rather than allowing you to reference any that are pre-existing.
It was at this stage the validation of the azure policy failed when during the pre-deployment validation, this identified the policy that was causing the issue. Once this policy issue was resolved we then recreated the databricks subnets within our VNET, created the NSG with the rules required for databricks and lastly recreated the NSG association with the subnets, all was performed using the same terraform configurations as we used initially.
With the VNET fully configured back to what it was prior to the above being deleted we then attempted again to apply our terraform configuration to recreate the workspace all of which succeeded.
@Taylor Russell , you can Accept this Answer now and if you have something to add please do !
Regards