Options to get data into the Fabric Lakehouse

Статия
09/04/2024

The get data experience covers all user scenarios for bringing data into the lakehouse, like:

Connecting to existing SQL Server and copying data into Delta table on the lakehouse.
Uploading files from your computer.
Copying and merging multiple tables from other lakehouses into a new Delta table.
Connecting to a streaming source to land data in a lakehouse.
Referencing data without copying it from other internal lakehouses or external sources.

Different ways to load data into a lakehouse

In Microsoft Fabric, there are a few ways you can get data into a lakehouse:

File upload from local computer
Run a copy tool in pipelines
Set up a dataflow
Apache Spark libraries in notebook code
Stream real-time events with Eventstream
Get data from Eventhouse

Local file upload

You can also upload data stored on your local machine. You can do it directly in the Lakehouse explorer.

Copy tool in pipelines

The Copy tool is a highly scalable Data Integration solution that allows you to connect to different data sources and load the data either in original format or convert it to a Delta table. Copy tool is a part of pipelines activities that you can modify in multiple ways, such as scheduling or triggering based on an event. For more information, see How to copy data using copy activity.

Dataflows

For users that are familiar with Power BI dataflows, the same tool is available to load data into your lakehouse. You can quickly access it from the Lakehouse explorer "Get data" option, and load data from over 200 connectors. For more information, see Quickstart: Create your first dataflow to get and transform data.

Notebook code

You can use available Spark libraries to connect to a data source directly, load data to a data frame, and then save it in a lakehouse. This method is the most open way to load data in the lakehouse that user code is fully managing.

Note

External Delta tables created with Spark code won't be visible to a SQL analytics endpoint. Use shortcuts in Table space to make external Delta tables visible for a SQL analytics endpoint.

Stream real-time events with Eventstream

With Eventstream, you can get, process, and route high volumes real-time events from a wide variety of sources.

Screenshot of getting data into a lakehouse from Eventstream.

To see how to add lakehouse as a destination for Eventstream, see Get data from Eventstream in a lakehouse.

For optimal streaming performance, you can stream data from Eventstream into an Eventhouse and then enable OneLake availability.

Get data from Eventhouse

When you enable OneLake availability on data in an Eventhouse, a Delta table is created in OneLake. This Delta table can be accessed by a lakehouse using a shortcut. For more information, see OneLake shortcuts. For more information, see Eventhouse OneLake Availability.

Considerations when choosing approach to load data

Use case	Recommendation
Small file upload from local machine	Use Local file upload
Small data or specific connector	Use Dataflows
Large data source	Use Copy tool in pipelines
Complex data transformations	Use Notebook code
Streaming data	Use Eventstream to stream data into Eventhouse; enable OneLake availability and create a shortcut from Lakehouse
Time-series data	Get data from Eventhouse

Споделяне чрез

Options to get data into the Fabric Lakehouse

Different ways to load data into a lakehouse

Local file upload

Copy tool in pipelines

Dataflows

Notebook code

Stream real-time events with Eventstream

Get data from Eventhouse

Considerations when choosing approach to load data

Обратна връзка

Допълнителни ресурси

Споделяне чрез

Options to get data into the Fabric Lakehouse

Different ways to load data into a lakehouse

Local file upload

Copy tool in pipelines

Dataflows

Notebook code

Stream real-time events with Eventstream

Get data from Eventhouse

Considerations when choosing approach to load data

Related content

Обратна връзка

Допълнителни ресурси