Referencing data in lakehouse for Data Science projects
This quickstart explains how to reference data stored in external ADLS account and use it in your Data science projects. After completing this quickstart, you'll have a shortcut to ADLS storage in your lakehouse and a notebook with Spark code that accesses your external data.
Prepare data for shortcut
In Azure create ADLS Gen2 account
Enable hierarchical namespaces
Create folders for your data
Upload data
Add your user identity to BlobStorageContributor role
Get storage account endpoint
Create a shortcut
Open your lakehouse to get to Lakehouse Explorer
Under files create a folder where you reference data
Right select (...) and select New Shortcut next to the folder name
Select External Sources > ADLS Gen2
Provide shortcut name, storage account endpoint, end your data folder location in storage account
Select create
Access referenced data in Notebook
- Open existing or create new notebook
- Pin your lakehouse to the notebook
- Browse your data in shortcut folder
- Select a file with structured data and drag it to notebook to get code generated
- Execute code to get file content
- Add code for data analysis
Related content
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for