Referencing data in lakehouse for Data Science projects

This quickstart explains how to reference data stored in external ADLS account and use it in your Data science projects. After completing this quickstart, you'll have a shortcut to ADLS storage in your lakehouse and a notebook with Spark code that accesses your external data.

Prepare data for shortcut

  1. In Azure create ADLS Gen2 account

  2. Enable hierarchical namespaces

    Screenshot of hierarchical namespaces in storage account.

  3. Create folders for your data

  4. Upload data

  5. Add your user identity to BlobStorageContributor role

  6. Get storage account endpoint

Create a shortcut

  1. Open your lakehouse to get to Lakehouse Explorer

  2. Under files create a folder where you reference data

  3. Right select (...) and select New Shortcut next to the folder name

    Screenshot of new shortcut link.

  4. Select External Sources > ADLS Gen2

  5. Provide shortcut name, storage account endpoint, end your data folder location in storage account

    Screenshot of new shortcut dialog.

  6. Select create

Access referenced data in Notebook

  1. Open existing or create new notebook
  2. Pin your lakehouse to the notebook
  3. Browse your data in shortcut folder
  4. Select a file with structured data and drag it to notebook to get code generated
  5. Execute code to get file content
  6. Add code for data analysis