Azure ADF Copy activity from Blob to Databricks delta lake table is failing with Malformed csv record error.

Derik Roby 16 Reputation points
2023-04-25T15:10:25.23+00:00

I created a ADF copy activity pipeline that moves csv.gz files from Azure blob storage to delta tables in Azure Databricks. All the data types are of the format string in both the source and sink. 2 files was moved successfully, but the last one keeps failing even though all the files have the same configurations. The pipeline fails with the below error message:

ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: org.apache.spark.SparkException: Job aborted. Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 38.0 failed 4 times, most recent failure: Lost task 0.3 in stage 38.0 (TID 58) (10.139.64.4 executor driver): com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@dbtpostgresstorage.blob.core.windows.net/7483a7f6-d13e-4cdb-835e-3f198633241d/AzureDatabricksDeltaLakeImportCommand/listings.txt. Caused by: org.apache.spark.SparkException: Malformed records are detected in record parsing. Parse Mode: FAILFAST. To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'. Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV record Caused by: org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV record Caused by: com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@dbtpostgresstorage.blob.core.windows.net/7483a7f6-d13e-4cdb-835e-3f198633241d/AzureDatabricksDeltaLakeImportCommand/listings.txt. Caused by: org.apache.spark.SparkException: Malformed records are detected in record parsing. Parse Mode: FAILFAST. To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'. Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV record Caused by: org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV record.

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,544 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,015 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,897 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,301 Reputation points Microsoft Employee
    2023-04-27T15:55:04.3233333+00:00

    Hi Derik Roby,

    Thank you for posting query in Microsoft Q&A Platform.

    From the error message it seems your file has some corrupted data which is not in expected format.

    Kindly see if under copy activity settings, you can use skip incompatible rows option to run it successfully. If not, then you need to consider checking your source file by opening in editor and identifying if there is any row is not in correct or expected format.

    Hope this helps. Please let me know how it goes.