How to fix "Failed to detect schema" error when importing a dataset from Azure Data Lake to Synapse table

YANG, RU-YING - (ruyingyang) 25 Reputation points
2023-09-13T22:53:52.4033333+00:00

I am trying to import a CSV file from Azure Data Lake Gen2 to a Synapse table with a delimited text format. However, I am encountering this error: "Failed to execute query. Error: Error encountered while parsing data: 'Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.'. Underlying data description: file 'https://tokoyoolympicnicole.dfs.core.windows.net/tokoyo-olympic-data/transformed-data/athletes/part-00000-tid-5626510608837589341-f52b5a0c-a944-4e40-981c-db22dda1b3dc-38-1-c000.csv'. The batch could not be analyzed because of compile errors."

I'm looking for a solution to fix this error and successfully import my dataset.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,916 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 24,531 Reputation points
    2023-09-14T09:12:14.69+00:00

    If you are importing a CSV file, make sure to specify that you are importing a delimited text file and not a Parquet file. You can use plain text editor or a CSV viewer to ensure that it is correctly formatted especially for the beginning and end of the file to make sure there are no unexpected characters or incomplete rows.

    I usually use this : https://csvlint.io/

    If your CSV file is particularly large, I would recommend splitting it into smaller chunks and importing them separately at least you will be able to know in which part it is not correctly formatted.

    0 comments No comments

  2. Smaran Thoomu 15,765 Reputation points Microsoft Vendor
    2023-09-14T10:26:26.04+00:00

    @YANG, RU-YING - (ruyingyang) Welcome to Microsoft Q&A platform and thanks for posting your question here.
    As I understand you are encountering an error while trying to import a CSV file from Azure Data Lake Gen2 to a Synapse table.

    Before we proceeding can you please provide screenshots and the steps how you are importing the CSV file to synapse?

    To resolve this issue, you can follow these steps:

    1. Check if the CSV file has any special characters or encoding that might be causing issues with schema detection. Ensure that the file is not corrupted and that it does not contain any invalid characters or formatting issues.
    2. Ensure that the file is stored in the correct location in Azure Data Lake Gen2 and that the Synapse workspace has the necessary permissions to access the file.
    3. If the issue persists, you can try ingesting the CSV file using other methods, such as using Azure Data Factory or Azure Databricks. These services provide additional options for configuring schema detection and data ingestion. You can refer to the Azure documentation on how to import data into Synapse workspace.

    Here are some relevant documentation links that can help you:

    I hope these steps help you resolve the issue. Let me know if you have any further questions or concerns.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.