Synapse Pipeline reading from Excel -- how to generate output table Schema

John Edwards 30 Reputation points
2024-01-05T19:40:36.25+00:00

I'm putting together my first pipeline to capture data in an Azure SQL Database table from an incoming Excel Spreadsheet. I've figured out how to load the spreadsheet to blob storage and to link to it. It's ready to go and I can see it and preview it in a Synapse Copy activity.

Here's the thing -- I want to write the output to a new table in Azure SQL. That is, I want Azure SQL to create a new table based on the incoming Excel spreadsheet, taking its best guess, even if is sucks, at column names and formats. That's an ideal jumping off point for this current task, as the spreadsheet being read is machine generated and designed to be pretty, not easy. So I need to assess what is where and get a leg-up on producing a proper output.

How do I get Azure Synapse to create a database table that matches the structure of an incoming spreadsheet?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,974 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,804 questions
0 comments No comments
{count} votes

Accepted answer
  1. Bhargava-MSFT 31,116 Reputation points Microsoft Employee
    2024-01-10T22:29:35.33+00:00

    Hello John Edwards,

    I'm glad that you were able to resolve your issue and thank you for posting the work around so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others", I'll repost your solution in case you'd like to "Accept" the answer.

    Issue:

    • Using Synapse/ADF, how to create a table on the fly that matches the structure of the incoming spreadsheet data.

    Challenge:

    When using the "Auto-create table" option, Synapse will attempt to create a new table in the Azure SQL Database that matches the structure of the incoming data. If the destination table already exists, Synapse will use the existing schema and throw an error if the incoming data does not match the schema

    Error:

    "Message": "Failure happened on 'Sink' side. ErrorCode=UserErrorInvalidColumnName,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The column Prop_0 is not found in target side,Source=Microsoft.DataTransfer.ClientLibrary,'",

    Solution/Work around:

    1. Create a table in your database with dummy structure.
    2. Create a data link (Data/Integration Data Sets) to that table in Synapse.
    3. Create a pipeline with a copy activity that uses that dummy table as its sink. Select the table option "Auto create table"
    4. Run it and let it fail.
    5. Go back to your original database and delete the table.
    6. Go back to Synapse and Run the pipeline again. It will create the table with the output fields from the source.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information. Thank you again for your time and patience throughout this issue.

    Please remember to "Accept Answer" if any answer/reply helped, so that others in the community facing similar issues can easily find the solution.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 25,866 Reputation points
    2024-01-05T21:11:13.2366667+00:00

    Make sure that the Excel file is accessible in Azure Blob Storage and that you have configured the necessary linked services and datasets in Azure Synapse Analytics (formerly SQL Data Warehouse) to access this file.

    You can use the preview feature in the Copy Data tool to understand the structure of your Excel data. This will give you a good idea of the column names and data types present in your spreadsheet.

    Azure Synapse Analytics's Copy Data tool can infer the schema from the source data (your Excel file in this case). When you configure the sink (destination) settings in the Copy Data tool, you have the option to let Azure Synapse create a new table in your Azure SQL Database. This new table's schema will be based on the inferred schema from the Excel file.

    If necessary, you can modify the mappings and add transformations during the Copy Data process to adjust column names and data types or to transform data as per your requirements.

    In the sink settings of your Copy Data activity, specify that you want to write to an Azure SQL Database. Provide the necessary linked service for your Azure SQL Database. There should be an option to auto-generate a table. This option will create a new table in your database with the inferred schema from the Excel file.

    The automatically generated schema might not be perfect. After the initial load, review the created table and make adjustments to the schema as needed. This might include renaming columns, changing data types, or adding constraints and indexes.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.