input_file_name

Ryan Abbey 1,181 Reputation points
2021-09-26T20:38:21.26+00:00

I am trying to read multiple parquet files and want to add the source file name to the dataframe using Synapse 2.4 cluster, however when adding the column using "input_file_name", the column is empty
spark.read.parquet(*sfile).withColumn("input_file_name", F.input_file_name())

Any known issues with this? Any alternative ways to get the filename added (short of a union loop)?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,951 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.