An Apache Spark-based analytics platform optimized for Azure.
Hi @Mohammed Aamer
Thanks for your question.
The “description” field in the OpenLineage JSON is optional metadata. It is not automatically populated with SQL expressions or notebook transformation code.
In general:
- The description field is meant to store a human-readable explanation of the transformation.
- It is not designed to automatically capture full SQL logic or notebook code from Databricks.
- If it is empty in your output, that is expected behavior.
Regarding column-level transformation expressions:
OpenLineage captures lineage relationships (which column came from which source column), but it does not capture the full transformation logic (like joins, filters, calculated expressions) from Databricks notebooks by default.
There is no additional configuration in standard Databricks OpenLineage integration that will automatically populate the description field with notebook SQL or PySpark code.
If you want something in the description field, it would require custom instrumentation or manually enriching the lineage metadata before sending it.
So yes, if you are expecting full transformation code to appear automatically, that is currently a limitation of OpenLineage integration rather than a configuration gap.
Hope this clarifies it. Please let me know if you have any more questions.