Azure Databricks ADLS gen2 mount fails

ROVERE Lorenzo 21 Reputation points
2022-03-18T10:29:35.317+00:00

Hi all,

We have some problems when trying to mount ADLS gen2 storage. The error when we run "dbutils.fs.mount" is:

Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://<storage-account-name>.dfs.core.windows.net/<container-name>/?upn=false&action=getAccessControl&timeout=90

The service principal has the recommended rights on the container, but we are afraid the firewall is blocking the traffic, so the Security Team is asking us the IP AND ports Databricks is trying to access the storage account. We cannot find any information about the ports used (the IP is simple). From the Driver Logs, before the errors, we see:

*Thu Mar 17 16:42:05 2022 Initialized gateway on port 37791
Thu Mar 17 16:42:07 2022 Python shell executor start
ExecutionError Traceback (most recent call last)
<command-1975187649771274> in <module>
15
16
---> 17 dbutils.fs.mount(
18 source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net",
19 mount_point = "/mnt/hdfsGen2Mount/",
/databricks/python_shell/dbruntime/dbutils.py in f_with_exception_handling(*args, **kwargs)
379 exc.context = None
380 exc.cause = None
--> 381 raise exc
382
383 return f_with_exception_handling
ExecutionError: An error occurred while calling o305.mount.
: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://<storage-account-name>.dfs.core.windows.net/<container-name>/?upn=false&action=getAccessControl&timeout=90
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:241)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:768)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:750)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getIsNamespaceEnabled(AzureBlobFileSystemStore.java:313)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:821)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:628)
at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:772)
at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:720)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Thu Mar 17 16:52:54 2022 Python shell started with PID 1045 and guid eac79ab5168148fabe12f3f627017bbb
Thu Mar 17 16:52:54 2022 Initialized gateway on port 42045
Thu Mar 17 16:52:55 2022 Python shell executor start
Thu Mar 17 16:53:26 2022 Python shell started with PID 1076 and guid 87da3fe8ff02445c842d7cb60d976c48
Thu Mar 17 16:53:26 2022 Initialized gateway on port 41209
Thu Mar 17 16:53:26 2022 Python shell executor start
KeyboardInterrupt Traceback (most recent call last)
<command-3523407472135170> in <module>
16 if returncode != 0 and False:
17 raise subprocess.CalledProcessError(returncode, cmd)
---> 18 ____databricks_percent_sh()
19 finally:
20 del ____databricks_percent_sh
<command-3523407472135170> in ____databricks_percent_sh()
10 stdout = subprocess.PIPE,
11 universal_newlines = True)
---> 12 for line in iter(p.stdout.readline, ''):
13 sys.stdout.write(line)
14 sys.stdout.flush()
KeyboardInterrupt: Dropped logging in PythonShell:*

In this example ports used seems to be 37791, 42045, 41209. so we suppose the ports stays in a range, but we cannot identify this range. Any ideas? Thanks

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,981 questions
{count} votes