Редактиране

Споделяне чрез


Options to get data into the Fabric Lakehouse

The get data experience covers all user scenarios for bringing data into the lakehouse, like:

  • Connecting to existing SQL Server and copying data into Delta table on the lakehouse.
  • Uploading files from your computer.
  • Copying and merging multiple tables from other lakehouses into a new Delta table.
  • Connecting to a streaming source to land data in a lakehouse.
  • Referencing data without copying it from other internal lakehouses or external sources.

Different ways to load data into a lakehouse

In Microsoft Fabric, there are a few ways you can get data into a lakehouse:

  • File upload from local computer
  • Run a copy tool in pipelines
  • Set up a dataflow
  • Apache Spark libraries in notebook code
  • Stream real-time events with Eventstream
  • Get data from Eventhouse

Local file upload

You can also upload data stored on your local machine. You can do it directly in the Lakehouse explorer.

Screenshot of file upload dialog in the Lakehouse explorer.

Copy tool in pipelines

The Copy tool is a highly scalable Data Integration solution that allows you to connect to different data sources and load the data either in original format or convert it to a Delta table. Copy tool is a part of pipelines activities that you can modify in multiple ways, such as scheduling or triggering based on an event. For more information, see How to copy data using copy activity.

Dataflows

For users that are familiar with Power BI dataflows, the same tool is available to load data into your lakehouse. You can quickly access it from the Lakehouse explorer "Get data" option, and load data from over 200 connectors. For more information, see Quickstart: Create your first dataflow to get and transform data.

Notebook code

You can use available Spark libraries to connect to a data source directly, load data to a data frame, and then save it in a lakehouse. This method is the most open way to load data in the lakehouse that user code is fully managing.

Note

External Delta tables created with Spark code won't be visible to a SQL analytics endpoint. Use shortcuts in Table space to make external Delta tables visible for a SQL analytics endpoint.

Stream real-time events with Eventstream

With Eventstream, you can get, process, and route high volumes real-time events from a wide variety of sources.

Screenshot of getting data into a lakehouse from Eventstream.

To see how to add lakehouse as a destination for Eventstream, see Get data from Eventstream in a lakehouse.

For optimal streaming performance, you can stream data from Eventstream into an Eventhouse and then enable OneLake availability.

Get data from Eventhouse

When you enable OneLake availability on data in an Eventhouse, a Delta table is created in OneLake. This Delta table can be accessed by a lakehouse using a shortcut. For more information, see OneLake shortcuts. For more information, see Eventhouse OneLake Availability.

Considerations when choosing approach to load data

Use case Recommendation
Small file upload from local machine Use Local file upload
Small data or specific connector Use Dataflows
Large data source Use Copy tool in pipelines
Complex data transformations Use Notebook code
Streaming data Use Eventstream to stream data into Eventhouse; enable OneLake availability and create a shortcut from Lakehouse
Time-series data Get data from Eventhouse