Databricks and abfss - Tables not picking up data

John Aherne 156 Reputation points
2022-09-18T19:36:39.677+00:00

We are in the process of changing our processes to use direct abfss connections to adls via service principal instead of mount points, as recommended by Databricks. We have run into a strange problem though.

We can read and write just fine with python. However, any partitioned tables created that point to the location of the data do not pick up anything. A query on the table returns zero rows.
Tried using refresh table, which has no effect.
Tried using msck repair table which returns an error about the key config to adls.

The tables do work with mount points, as we can use msck repair on those.

Any thoughts on what I might be missing?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

1 answer

Sort by: Most helpful
  1. John Aherne 156 Reputation points
    2023-10-03T18:04:20.9566667+00:00

    Just an update on this:

    There is some sort of bug, but no eta on a fix.

    However, as a workaround, you can set the cluster configuration on the linked service. you can even use the Databricks secret scope and secrets as if you were setting them on a cluster.

    Since the storage configs are set on the cluster at startup, msck and other commands work.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.