Data lineage

Data lineage plays an important role in cloud-scale analytics. Lineage shows dependencies between raw data and finished products, describing the transformations and manipulations that turn that raw data into the final data products. It helps organizations understand data quality and validate compliance. It also adds context to datasets and products that enables data products to be discoverable and self-serviceable.

A primary of any data catalog is its ability to show the lineage between data products. Azure Purview supports data lineage capture from three Azure Data Factory activities:

  • copy data
  • data flow
  • execute SSIS package

In addition to this native lineage reporting, Azure Purview also allows custom lineage reporting via Apache Atlas hooks or REST API.

Diagram of data lineage.

Important

Azure Data Factory and Azure Synapse pipelines are recommended for ingestion solutions because they enable data lineage in Azure Purview. Alternate ingestion patterns should use Apache Atlas API to update data lineage as part of their data processing.

Azure Purview data lineage

One of Azure Purview's platform features is its ability to show the lineage between datasets created by data processes. Systems like Data Factory, Data Share, and Power BI capture the lineage of data as it moves. You can also get custom lineage reporting via Atlas hooks and REST API.

Tip

For more information on supported systems and best practices, see the Microsoft Purview Data Catalog lineage user guide.

Next steps

Learn how to manage master data in Azure.