Share via

error while trying copy data to a hash distributed table

Calleb Cecco 11 Reputation points
2024-02-19T17:10:31.93+00:00

I'm getting the following error when my notebook tries to copy data to a hash distributed table. It was working since friday, nothing was changed in the weekend, so I dont have any idea whats going on the code:

itens_nota_final_df.write.option(Constants.SERVER, "myserver").mode("overwrite").synapsesql("myserver.dbo.StagingTable")			

The error: ErrorNumber - 105222, ErrorMessage - COPY statement using Parquet and auto create table enabled currently cannot load into hash-distributed tables. See here for more information: https://aka.ms/AAgz988

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.

0 comments No comments

2 answers

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,571 Reputation points Microsoft Employee Moderator
    2024-03-05T15:25:10.38+00:00

    Hi Calleb Cecco and everyone,

    We got the below response from the team regarding the above issue with respect to using sql script or copy activity, however, regarding the above approach of using pyspark notebook, need to still get information:

    Kindly load the data into a temporary round robin table, and use INSERT ... SELECT from the table to the target hash-distributed table. If that is not an option, remove or set AUTO_CREATE_TABLE to 'OFF'.

    In addition, if you have already loaded data onto a given hash-distributed table before this block was instituted, kindly perform CREATE TABLE AS SELECT from that table into to a new table (with renaming to not affect existing scripts).

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

    Was this answer helpful?


  2. AnnuKumari-MSFT 34,571 Reputation points Microsoft Employee Moderator
    2024-02-20T05:37:43.2866667+00:00

    Hi Calleb Cecco ,

    Welcome to Microsoft Q&A platform and thanks for posting your query here.

    It seems you are facing error "COPY statement using Parquet and auto create table enabled currently cannot load into hash-distributed tables" while trying copy data to a hash distributed table.

    As per the following documentation Copy and transform data in Azure Synapse Analytics by using Azure Data Factory or Synapse pipelines , Auto create table option in copy activity sink only gives default distribution i,e. ROUND_ROBIN

    User's image

    The error message you are seeing indicates that you are trying to use the COPY statement to load data into a hash-distributed table using Parquet format, and the auto create table option is enabled.

    To resolve this issue, you can try the following steps:

    1. You can disable the auto create table option when using the COPY statement to load data into a hash-distributed table. This will allow you to load data into the existing hash distributed table.
    2. Use a round-robin distributed table: If you need to load data into a distributed table using Parquet format and the auto create table option is enabled, you can go with the default round-robin distribution instead of a hash-distribution.

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.