Nota
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba mendaftar masuk atau menukar direktori.
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba menukar direktori.
Build Lakeflow Spark Declarative Pipelines pipelines by loading and transforming data, applying data quality checks, and writing results to your target tables. The following topics cover the tasks involved in building and running pipelines.
To learn the declarative concepts behind pipelines—datasets, flows, and the pipeline graph—see What is Lakeflow Spark Declarative Pipelines. For a step-by-step walkthrough, see Tutorial: Build an ETL pipeline using change data capture.
| Topic | Description |
|---|---|
| Develop in the Lakeflow Pipelines Editor | Author, run, and debug pipelines in the editor, with a pipeline graph, data previews, and selective execution. |
| Use Genie Code for pipeline development | Generate, edit, and debug pipeline code from a single prompt with Genie Code Agent mode in the editor. |
| Load data | Ingest data into your pipeline from cloud object storage and streaming message buses. |
| Transform data | Apply transformations, joins, and aggregations to build derived datasets. |
| Full refresh for streaming tables | Reprocess all source data to rebuild a streaming table. |
| Data quality | Validate records with expectations and control what happens when a record fails. |
| Write datasets | Write pipeline results to sinks such as Apache Kafka and Azure Event Hubs, and use flows to write to streaming targets. |
Additional resources
- Optimize stateful processing with watermarks
- Incremental refresh for materialized views
- Access materialized views and streaming tables using external systems
- Develop and debug pipelines with a notebook (legacy)
- Develop pipeline code in your local development environment
- Use parameters with pipelines
- Convert a pipeline into a bundle project
- Prepare your data for GDPR compliance