Has the Dataflow IR upgrade to spark 3.4 broken Timestamps stored in parquet?

Neil Sisson 5 Reputation points
2025-05-29T17:31:55.7533333+00:00

It appears that over the past couple of days, the Dataflow Integration Runtimes have migrated to spark 3.4. When the source is a parquet file, ADF Dataflows are now nulling all source timestamps stored as parquet int64 where the field metadata isAdjustedToUTC=false. Looking at the spark migration guide: https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-33-to-34, we can see there were changes:

Since Spark 3.4, when schema inference on external Parquet files, INT64 timestamps with annotation isAdjustedToUTC=false will be inferred as TimestampNTZ type instead of Timestamp type. To restore the legacy behavior, set spark.sql.parquet.inferTimestampNTZ.enabled to false.

How should we proceed here? Try to modify the metadata upstream? I think it would prudent to rollback to spark 3.3, as its clear this hasn't been tested.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,651 questions
{count} vote

1 answer

Sort by: Most helpful
  1. Neil Sisson 5 Reputation points
    2025-05-30T15:27:06.94+00:00

    We have been able to bypass this issue for now by altering the SQL queries of our upstream parquet source (google big query) to export datetimes/timestamps as strings in the following format: UNIX_MILLIS(TIMESTAMP(datecolumn)). The ADF dataflow parquet sources were able to cast these strings as timestamps, requiring no modifications to our Transformation layer.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.