error while trying copy data to a hash distributed table

Question

error while trying copy data to a hash distributed table

Calleb Cecco 11

I'm getting the following error when my notebook tries to copy data to a hash distributed table. It was working since friday, nothing was changed in the weekend, so I dont have any idea whats going on the code:

itens_nota_final_df.write.option(Constants.SERVER, "myserver").mode("overwrite").synapsesql("myserver.dbo.StagingTable")

The error: ErrorNumber - 105222, ErrorMessage - COPY statement using Parquet and auto create table enabled currently cannot load into hash-distributed tables. See here for more information: https://aka.ms/AAgz988

0 comments

2 answers

Your answer

Answer 1

AnnuKumari-MSFT 34,571 Microsoft Employee Moderator

Hi Calleb Cecco and everyone,

We got the below response from the team regarding the above issue with respect to using sql script or copy activity, however, regarding the above approach of using pyspark notebook, need to still get information:

Kindly load the data into a temporary round robin table, and use INSERT ... SELECT from the table to the target hash-distributed table. If that is not an option, remove or set AUTO_CREATE_TABLE to 'OFF'.

In addition, if you have already loaded data onto a given hash-distributed table before this block was instituted, kindly perform CREATE TABLE AS SELECT from that table into to a new table (with renaming to not affect existing scripts).

Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

Anonymous

2024-03-06T07:14:15.97+00:00
Additional points here:

can we please stop using this term AUTO_CREATE_TABLE = 'ON , there is no such option when we are using synapse spark, if this is being done from backend product then it is not in our control.

Just to be clear, it does not matter if a load from a round robin table to hash table or from a hash table to hash table , the functionality to load data into hash table using synpase spark notebooks is not available and what was working before was incorrect and should not have been there in the first place itself.

Conclusion: Synapse spark write to Hash distributed table is not supported.

Are these 3 points correct and #3 point is the current scenario with the product , please confirm @AnnuKumari-MSFT
Muruga MuthuKrishnan 26 Reputation points

2024-03-06T09:08:48.6733333+00:00

@AnnuKumari-MSFT In our scenario we are not using Parquet file to table load. Now we are able to load the data in HASH tables which wasn't case 2 weeks before. Please confirm whether this issue is resolved or still in-progress.

Answer 2

AnnuKumari-MSFT 34,571 Microsoft Employee Moderator

Hi Calleb Cecco ,

Welcome to Microsoft Q&A platform and thanks for posting your query here.

It seems you are facing error "COPY statement using Parquet and auto create table enabled currently cannot load into hash-distributed tables" while trying copy data to a hash distributed table.

As per the following documentation Copy and transform data in Azure Synapse Analytics by using Azure Data Factory or Synapse pipelines , Auto create table option in copy activity sink only gives default distribution i,e. ROUND_ROBIN

User's image

The error message you are seeing indicates that you are trying to use the COPY statement to load data into a hash-distributed table using Parquet format, and the auto create table option is enabled.

To resolve this issue, you can try the following steps:

You can disable the auto create table option when using the COPY statement to load data into a hash-distributed table. This will allow you to load data into the existing hash distributed table.
Use a round-robin distributed table: If you need to load data into a distributed table using Parquet format and the auto create table option is enabled, you can go with the default round-robin distribution instead of a hash-distribution.

Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

Anonymous

2024-02-20T07:57:58.8666667+00:00

I am also facing this issue with the exact timeline that Calleb mentioned, the response that has been given does not match the problem statement, in spark synapse we are not using any pipeline or copy activity, we are only using the spark synapse connector in which we do not provide any copy statement, please suggest solution based on spark synapse problem statement and not pipeline sink copy related.
AnnuKumari-MSFT 34,571 Reputation points Microsoft Employee Moderator

2024-02-20T15:43:57.54+00:00

Aditya , Makes sense . Thanks for the clarification. I am going to repro the scenario from my end and will take it forward to the internal team . Thanks
Abhijeet Singh 30 Reputation points

2024-02-27T16:18:36.2066667+00:00

Getting same error. I am using pyspark to overwrite pyspark dataframe to dedicated pool table. Code was running fine on thursday
Abhijeet Singh 30 Reputation points

2024-02-28T10:16:28.56+00:00

I am facing same issue and we are using pyspark code to write pyspark dataframe to dedicated pool table. Previously it was working fine
Abhijeet Singh 30 Reputation points

2024-03-05T01:16:26.95+00:00

Is there any update
Arn Sveinung Pettersen 0 Reputation points

2024-03-05T06:03:55.35+00:00
I'm having kind off the same issue. Using notebook,

df_spark.write.mode("append").synapsesql()
Anonymous

2024-03-05T15:14:29.9866667+00:00

This has suddenly started working now? any idea how and why?

Share via

error while trying copy data to a hash distributed table

2 answers

Your answer