Connect to OneLake Storage

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

This tutorial shows how to connect to OneLake with a Jupyter notebook from an Azure HDInsight on AKS cluster.

  1. Create an HDInsight on AKS cluster with Apache Sparkā„¢. Follow these instructions: Set up clusters in HDInsight on AKS.

  2. While providing cluster information, remember your Cluster login Username and Password, as you need them later to access the cluster.

  3. Create a user assigned managed identity (UAMI): Create for Azure HDInsight on AKS - UAMI and choose it as the identity in the Storage screen.

    Screenshot showing cluster basic tab.

  4. Give this UAMI access to the Fabric workspace that contains your items. Learn more about Fabric role-based access control (RBAC): Workspace roles to decide what role is suitable.

    Screenshot showing manage access box.

  5. Navigate to your Lakehouse and find the Name for your workspace and Lakehouse. You can find them in the URL of your Lakehouse or the Properties pane for a file.

  6. In the Azure portal, look for your cluster and select the notebook. Screenshot showing cluster overview page.

  7. Create a new Notebook and select type as pyspark.

  8. Copy the workspace and Lakehouse names into your notebook and build your OneLake URL for your Lakehouse. Now you can read any file from this file path.

    fp = 'abfss://' + 'Workspace Name' + '@onelake.dfs.fabric.microsoft.com/' + 'Lakehouse Name' + '/Files/' 
    1df = spark.read.format("csv").option("header", "true").load(fp + "test1.csv") 
    1df.show()
    
  9. Try to write some data into the Lakehouse.

    writecsvdf = df.write.format("csv").save(fp + "out.csv")

  10. Test that your data was successfully written by checking in your Lakehouse or by reading your newly loaded file.

Reference