What is Lakeflow Connect?

2025-07-02

Lakeflow Connect offers simple and efficient connectors to ingest data from local files, popular enterprise applications, databases, cloud storage, message buses, and more. This page outlines some of the ways that Lakeflow Connect can improve ETL performance. It also covers common use cases and the range of supported ingestion tools, from fully-managed connectors to fully-customizable frameworks.

Flexible service models

Lakeflow Connect offers a broad range of connectors for enterprise applications, cloud storage, databases, message buses, and more. It also gives you the flexibility to choose between the following:

A fully-managed service	Out-of-the-box connectors that democratize data access with simple UIs and powerful APIs. This allows you to quickly create robust ingestion pipelines while minimizing long-term maintenance costs.
A custom pipeline	If you need more customization, you can use Lakeflow Declarative Pipelines or Structured Streaming. Ultimately, this versatility enables Lakeflow Connect to meet your organization's specific needs.

Unification with core Databricks tools

Lakeflow Connect uses core Databricks features to provide comprehensive data management. For example, it offers governance using Unity Catalog, orchestration using Lakeflow Jobs, and holistic monitoring across your pipelines. This helps your organization manage data security, quality, and cost while unifying your ingestion processes with your other data engineering tools. Lakeflow Connect is built on an open Data Intelligence Platform, with full flexibility to incorporate your preferred third-party tools. This ensures a tailored solution that aligns with your existing infrastructure and future data strategies.

Fast, scalable ingestion

Lakeflow Connect uses incremental reads and writes to enable efficient ingestion. When combined with incremental transformations downstream, this can significantly improve ETL performance.

Common use cases

Customers ingest data to solve their organizations' most challenging problems. Sample use cases include the following:

Use case	Description
Customer 360	Measuring campaign performance and customer lead scoring
Portfolio management	Maximizing ROI with historical and forecasting models
Consumer analytics	Personalizing your customers' purchasing experiences
Centralized human resources	Supporting your organization's workforce
Digital twins	Increasing manufacturing efficiency
RAG chatbots	Building chatbots to help users understand policies, products, and more

Layers of the ETL stack

Some connectors operate at one level of the ETL stack. For example, Databricks offers fully-managed connectors for enterprise applications like Salesforce and databases like SQL Server. Other connectors operate at multiple layers of the ETL stack. For example, you can use standard connectors in either Structured Streaming for full customization or Lakeflow Declarative Pipelines for a more managed experience. You can similarly choose your level of customization for streaming data from Apache Kafka, Amazon Kinesis, Google Pub/Sub, and Apache Pulsar.

ETL stack diagram

Databricks recommends starting with the most managed layer. If it doesn't satisfy your requirements (for example, if it doesn't support your data source), drop down to the next layer. Databricks plans to expand support for more connectors across all three layers.

The following table describes the three layers of ingestion products, ordered from most customizable to most managed:

Layer	Description
Structured Streaming	Structured Streaming is an API for incremental stream processing in near real-time. It provides strong performance, scalability, and fault tolerance.
Lakeflow Declarative Pipelines	Lakeflow Declarative Pipelines builds on Structured Streaming, offering a more declarative framework for creating data pipelines. You can define the transformations to perform on your data, and Lakeflow Declarative Pipelines manages orchestration, monitoring, data quality, errors, and more. Therefore, it offers more automation and less overhead than Structured Streaming.
Fully-managed connectors	Fully-managed connectors build on Lakeflow Declarative Pipelines, offering even more automation for the most popular data sources. They extend Lakeflow Declarative Pipelines functionality to also include source-specific authentication, CDC, edge case handling, long-term API maintenance, automated retries, automated schema evolution, and so on. Therefore, they offer even more automation for any supported data sources.

Managed connectors

You can use fully-managed connectors to ingest from enterprise applications and databases.

Supported connectors include:

Supported interfaces include:

Databricks UI
Databricks Asset Bundles
Databricks APIs
Databricks SDKs
Databricks CLI

Standard connectors

In addition to the managed connectors, Databricks offers customizable connectors for cloud object storage and message buses. See Standard connectors in Lakeflow Connect.

File upload and download

You can ingest files that reside on your local network, files that have been uploaded to a volume, or files that are downloaded from an internet location. See Files.

Ingestion partners

Many third-party tools support batch or streaming ingestion into Databricks. Databricks validates various third-party integrations, although the steps to configure access to source systems and ingest data vary by tool. See Ingestion partners for a list of validated tools. Some technology partners are also featured in Databricks Partner Connect, which has a UI that simplifies connecting third-party tools to Lakehouse data.

DIY ingestion

Databricks provides a general compute platform. As a result, you can create your own ingestion connectors using any programming language supported by Databricks, like Python or Java. You can also import and use popular open source connector libraries like data load tool, Airbyte, and Debezium.

Ingestion alternatives

Databricks recommends ingestion for most use cases because it scales to accommodate high data volumes, low-latency querying, and third-party API limits. Ingestion copies data from your source systems to Azure Databricks, which results in duplicate data that might become stale over time. If you don't want to copy data, you can use the following tools:

Tool	Description
Lakehouse Federation	Allows you to query external data sources without moving your data.
Delta Sharing	Allows you to securely share data across platforms, clouds, and regions.