Share via

Clarification on “Description” field in OpenLineage JSON metadata

Mohammed Aamer 160 Reputation points
2026-02-25T09:53:19.73+00:00

Hi Team,

We have enabled OpenLineage for our Azure Databricks environment to capture column-level lineage. In the transformation section of the OpenLineage output, we see the following structure:

"transformations": [

We would like to understand what the “description” field is intended to capture. Is this field supposed to store the actual column-level transformation logic (for example, SQL expressions or notebook transformation code)?

Currently, this field is always empty in our lineage output. Please let us know:

  • What information should ideally be populated in the description field.
  • Does OpenLineage support capturing full column-level transformation expressions from Databricks notebooks.
  • Whether any additional configuration or customization is required to populate it.

If the actual transformation code cannot be captured then is it an Openlineage product limitation?

Regards,
Aamer

Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.

{count} votes

Answer accepted by question author
  1. Smaran Thoomu 33,840 Reputation points Microsoft External Staff Moderator
    2026-02-25T17:03:23.2666667+00:00

    Hi @Mohammed Aamer
    Thanks for your question.

    The “description” field in the OpenLineage JSON is optional metadata. It is not automatically populated with SQL expressions or notebook transformation code.

    In general:

    • The description field is meant to store a human-readable explanation of the transformation.
    • It is not designed to automatically capture full SQL logic or notebook code from Databricks.
    • If it is empty in your output, that is expected behavior.

    Regarding column-level transformation expressions:

    OpenLineage captures lineage relationships (which column came from which source column), but it does not capture the full transformation logic (like joins, filters, calculated expressions) from Databricks notebooks by default.

    There is no additional configuration in standard Databricks OpenLineage integration that will automatically populate the description field with notebook SQL or PySpark code.

    If you want something in the description field, it would require custom instrumentation or manually enriching the lineage metadata before sending it.

    So yes, if you are expecting full transformation code to appear automatically, that is currently a limitation of OpenLineage integration rather than a configuration gap.

    Hope this clarifies it. Please let me know if you have any more questions.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.