Skip incompatible rows (Fault tolerance) doesn't work

Question

Skip incompatible rows (Fault tolerance) doesn't work

Sofía Neithardt 0

Hi,

I have a dataset with semicolon delimiter but i also have a semicolon on a string field, so it cause some rows have more columns than the correct ones. Because of that i decided to activated the Fault Tolerance on Copy Activity, but unfortunately when i run the pipeline some rows are skipped, but a few ones stays on file causing errors in subsequent processes.

When i proceess again the output file without skipped rows, the other ones that hadn't been omitted are skipped now, so i don't understand what is happening.

I thought it was because of a limit of rows per file or a limit of rows skipped, but i made some test and it doesn't.

Have someone experimented something like that?

Sofía Neithardt 0 Reputation points

2023-08-09T18:24:11.5633333+00:00

Hi @Bhargava-MSFT , thanks for your response.

I get you, but i don't have a schema in the destination dataset, it's a Copy Activity which get the data source from the on premise and put it on the adls gen2. So, what is the schema that it takes to skip rows?

I examined the log on the adls (incompatible rows) but the problem is that i don´t see the cause of the failure.

Here is an example:

These are 3 rows of the first file. As you could see the first row is the correct one with 53 columns. The other ones have 54 columns. Both of them must be skipped, nevertheless the third case isn't skipped but i dont know why.

correct:

0137;02;03;0001;800910325740;00001;000000000001; ;5342623000743505 ; ;2014-04-29;2015-03-02;05;S ;0;00;0001-01-01; ;2015-03-31;02;201704;000000;201504;0000;0000;000 ;0000;M01; ;2;0;0;0;S;S;N;N;N;0000;03; ; ;N; ;00000000000000;A00;N;N;S;0137;0013;ATB689 ;BATCH ;2015-03-31-20.03.10.493935

skip:

0137;02;30;0001;800910325840;00001;000000000005; ;5101980100236710 ;5101980100084524 ;2019-10-20;2021-02-22;05;S ;0;01;2023-02-10;10/02/2023 20;15 LM ;0001-01-01; ;202304;202304;202104;0000;0000;000 ;0000;M02; ;2;0;0;0;S;S;N;N;N;0000;03; ; ;N; ;00000000000000;A00;N;N;S;0137;0001;PRIFONBO;\TQM ;2023-02-10-20.15.35.346510

no skip:

0137;01;53;0062;900010719140;00001;000000000001; ;4142842001377080 ; ;2012-07-24;0001-01-01;14;N ;0;31;2017-04-18;BRED 130417 12;03 ;2017-04-24;05;204012;000000;201207;0000;0000;000 ;0000;VD1;VD1;0;0;0;0;S;S;N;N;N;0000;03; ; ;N; ;00000000000000;A00;S;N;S;0137;0073;YS07690 ;BC05 ;2017-04-24-12.21.35.285197
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-09T23:58:13.2033333+00:00

Hello @Sofía Neithardt ,

Thank you for providing more information.

As you mentioned, there is no schema in the destination dataset and the Copy Activity is fetching data from an on-premises source. Can you let me know your source dataset here? and did you define schema in the source?

Since the delimiter itself is causing the issue, is it possible for you to use a different delimiter to separate the values in the source?

In the log files, did you see any error messages?

also, you can try to handle the semicolon delimiter issue by using the correct delimiter settings on the source dataset as shown in the screenshot.

If you could send the copy activity JSON and the source dataset data it would help reproduce the issue.

Looking forward to hearing from you.
Sofía Neithardt 0 Reputation points

2023-08-10T02:00:48.3566667+00:00
Hi @Bhargava-MSFT , thanks again for your comments and help.

I'll show you the data set config:

Dataset config:

Source config in Copy Activity:

I attached a short file which has two rows.

The first has 54 columns.

The second has 53 columns.

I haven't understand which is the schema that the Copy Activity uses to define which row skip yet but i'm totally sure that one of the rows should be skipped due to the count of columns and it doesn't happen.example_skip_rows.TXT
Sofía Neithardt 0 Reputation points

2023-08-10T19:46:16.3766667+00:00

@Bhargava-MSFT I fogot to send you the json. It's in txt format because the forum don't let me attach a json file. example_skip_rows_json.TXT
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-10T21:12:10.32+00:00

Hello @Sofía Neithardt ,

When you use the column delimiter as Semicolon on the source dataset, ADF internally maps the columns based on the input schema(you can see the preview to view the data)

If you look at the below screenshot, on the second row, Prop_54 column is written as null and copied over to the storage account(here all columns are mapped as string datatype). This is why both rows copied over to the storage account.

But if you don't use the semicolon as the delimiter, row will be copied as the single column.

Please check this video tutorial: How to Redirect Bad Records in Copy Activity in Azure Data Factory

I hope this makes sense to you.
Sofía Neithardt 0 Reputation points

2023-08-14T20:48:35.2733333+00:00

Hi, @BhargavaGunnam-MSFT ,

Thank you again because your explain helped me to understand how ADF internally works to maps the columns based on the input schema. So im using the preview to see which schema is using before to run the pipeline.

I didn't gave you a good example because the dataset had 54 columns in first row, so in that case ADF assumes that the schema has 54 columns and subsequent rows haven't the last one complete, so it isn't an error. But i have another example where the dataset has a row with 53 columns at first place but then a row with 54 columns appears near to the end of the file (row number 2969118 ) and the Copy Activity isn't fail or isn't skip it if the Fault Tolerance is on.

I can't attach the file with this problem because is too long. Can we meet a few seconds to sharing my screen and show you?

I hope you could you help me once again.

Finally, I get the other way you gave me, where i can copy the file as a single column if i don't use the semicolon as the delimiter, but that is a plan b that i prefer don't use. I want o understand why the Fault tolerance doesn't work as it should.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-15T18:16:44.8833333+00:00

Hello @Sofía Neithardt ,

I'm glad I could assist you in understanding how ADF works with column mapping and schemas. However, I'm sorry to inform you that I am not authorized to have a call. If you require further assistance, I recommend opening a support ticket. A support engineer can guide you through the issue and assist in a remote session.

If you don't have a support plan, I can enable a one-time free support request for you to work on this issue.

I am looking forward to hearing from you.
Sofía Neithardt 0 Reputation points

2023-08-15T20:08:18.14+00:00
Hello @Bhargava-MSFT ..

I have tried to create a support ticket, but it ask about the run id, and then it doesn't let me go on with the ticket creation because it doesn't find and error. It gives me some possible solutions, but i need to interact with someone to show the issue. Could you tell me another way to create the support ticket?

I'll show you how i'm trying to create it:

1)

2)

In this step when i go to "Recommended solution" it doesn't let me to advance, so finally i can't create a ticket.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-15T21:42:10.37+00:00

@Sofía Neithardt

You can skip the recommended solution by clicking on Return to support request and then click on "Next" to submit the support request.

Please see the below screen for your reference.
Sofía Neithardt 0 Reputation points

2023-08-16T13:06:29.3633333+00:00

Hi @Bhargava-MSFT

I finally created the support ticket!!

I want to say that i’m so grateful and i really appreciate all your help so so much!!
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-16T15:09:12.2233333+00:00

Thank you, @Sofía Neithardt

1 answer

Your answer

Sofía Neithardt 0 Reputation points

2023-08-09T18:24:11.5633333+00:00

Hi @Bhargava-MSFT , thanks for your response.

I get you, but i don't have a schema in the destination dataset, it's a Copy Activity which get the data source from the on premise and put it on the adls gen2. So, what is the schema that it takes to skip rows?

I examined the log on the adls (incompatible rows) but the problem is that i don´t see the cause of the failure.

Here is an example:

These are 3 rows of the first file. As you could see the first row is the correct one with 53 columns. The other ones have 54 columns. Both of them must be skipped, nevertheless the third case isn't skipped but i dont know why.

correct:

0137;02;03;0001;800910325740;00001;000000000001; ;5342623000743505 ; ;2014-04-29;2015-03-02;05;S ;0;00;0001-01-01; ;2015-03-31;02;201704;000000;201504;0000;0000;000 ;0000;M01; ;2;0;0;0;S;S;N;N;N;0000;03; ; ;N; ;00000000000000;A00;N;N;S;0137;0013;ATB689 ;BATCH ;2015-03-31-20.03.10.493935

skip:

0137;02;30;0001;800910325840;00001;000000000005; ;5101980100236710 ;5101980100084524 ;2019-10-20;2021-02-22;05;S ;0;01;2023-02-10;10/02/2023 20;15 LM ;0001-01-01; ;202304;202304;202104;0000;0000;000 ;0000;M02; ;2;0;0;0;S;S;N;N;N;0000;03; ; ;N; ;00000000000000;A00;N;N;S;0137;0001;PRIFONBO;\TQM ;2023-02-10-20.15.35.346510

no skip:

0137;01;53;0062;900010719140;00001;000000000001; ;4142842001377080 ; ;2012-07-24;0001-01-01;14;N ;0;31;2017-04-18;BRED 130417 12;03 ;2017-04-24;05;204012;000000;201207;0000;0000;000 ;0000;VD1;VD1;0;0;0;0;S;S;N;N;N;0000;03; ; ;N; ;00000000000000;A00;S;N;S;0137;0073;YS07690 ;BC05 ;2017-04-24-12.21.35.285197
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-09T23:58:13.2033333+00:00

Hello @Sofía Neithardt ,

Thank you for providing more information.

As you mentioned, there is no schema in the destination dataset and the Copy Activity is fetching data from an on-premises source. Can you let me know your source dataset here? and did you define schema in the source?

Since the delimiter itself is causing the issue, is it possible for you to use a different delimiter to separate the values in the source?

In the log files, did you see any error messages?

also, you can try to handle the semicolon delimiter issue by using the correct delimiter settings on the source dataset as shown in the screenshot.

If you could send the copy activity JSON and the source dataset data it would help reproduce the issue.

Looking forward to hearing from you.
Sofía Neithardt 0 Reputation points

2023-08-10T02:00:48.3566667+00:00

Hi @Bhargava-MSFT , thanks again for your comments and help.

I'll show you the data set config:

Dataset config:

Source config in Copy Activity:

I attached a short file which has two rows.

The first has 54 columns.

The second has 53 columns.

I haven't understand which is the schema that the Copy Activity uses to define which row skip yet but i'm totally sure that one of the rows should be skipped due to the count of columns and it doesn't happen.example_skip_rows.TXT
Sofía Neithardt 0 Reputation points

2023-08-10T19:46:16.3766667+00:00

@Bhargava-MSFT I fogot to send you the json. It's in txt format because the forum don't let me attach a json file. example_skip_rows_json.TXT
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-10T21:12:10.32+00:00

Hello @Sofía Neithardt ,

When you use the column delimiter as Semicolon on the source dataset, ADF internally maps the columns based on the input schema(you can see the preview to view the data)

If you look at the below screenshot, on the second row, Prop_54 column is written as null and copied over to the storage account(here all columns are mapped as string datatype). This is why both rows copied over to the storage account.

But if you don't use the semicolon as the delimiter, row will be copied as the single column.

Please check this video tutorial: How to Redirect Bad Records in Copy Activity in Azure Data Factory

I hope this makes sense to you.
Sofía Neithardt 0 Reputation points

2023-08-14T20:48:35.2733333+00:00

Hi, @BhargavaGunnam-MSFT ,

Thank you again because your explain helped me to understand how ADF internally works to maps the columns based on the input schema. So im using the preview to see which schema is using before to run the pipeline.

I didn't gave you a good example because the dataset had 54 columns in first row, so in that case ADF assumes that the schema has 54 columns and subsequent rows haven't the last one complete, so it isn't an error. But i have another example where the dataset has a row with 53 columns at first place but then a row with 54 columns appears near to the end of the file (row number 2969118 ) and the Copy Activity isn't fail or isn't skip it if the Fault Tolerance is on.

I can't attach the file with this problem because is too long. Can we meet a few seconds to sharing my screen and show you?

I hope you could you help me once again.

Finally, I get the other way you gave me, where i can copy the file as a single column if i don't use the semicolon as the delimiter, but that is a plan b that i prefer don't use. I want o understand why the Fault tolerance doesn't work as it should.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-15T18:16:44.8833333+00:00

Hello @Sofía Neithardt ,

I'm glad I could assist you in understanding how ADF works with column mapping and schemas. However, I'm sorry to inform you that I am not authorized to have a call. If you require further assistance, I recommend opening a support ticket. A support engineer can guide you through the issue and assist in a remote session.

If you don't have a support plan, I can enable a one-time free support request for you to work on this issue.

I am looking forward to hearing from you.
Sofía Neithardt 0 Reputation points

2023-08-15T20:08:18.14+00:00

Hello @Bhargava-MSFT ..

I have tried to create a support ticket, but it ask about the run id, and then it doesn't let me go on with the ticket creation because it doesn't find and error. It gives me some possible solutions, but i need to interact with someone to show the issue. Could you tell me another way to create the support ticket?

I'll show you how i'm trying to create it:

1)

2)

In this step when i go to "Recommended solution" it doesn't let me to advance, so finally i can't create a ticket.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-15T21:42:10.37+00:00

@Sofía Neithardt

You can skip the recommended solution by clicking on Return to support request and then click on "Next" to submit the support request.

Please see the below screen for your reference.
Sofía Neithardt 0 Reputation points

2023-08-16T13:06:29.3633333+00:00

Hi @Bhargava-MSFT

I finally created the support ticket!!

I want to say that i’m so grateful and i really appreciate all your help so so much!!
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-16T15:09:12.2233333+00:00

Thank you, @Sofía Neithardt

Answer 1

Hello @Sofía Neithardt ,

Welcome to the Microsoft Q&A forum.

When the Fault Tolerance feature is enabled, it skips rows that don't match the schema of the destination dataset. In your case, the semicolon delimiter is causing some rows to have more columns than the correct ones, which is why they are being skipped.

Did you examine the logs on the Azure blob storage (incompatible rows) to see the cause of the failure?

It seems like the rows skipped in the first run were not fixed in the data source, which caused them to be skipped again in the second run. Please check the log files to see if this is the case.

Also, you can try to handle the semicolon delimiter issue by using the correct delimiter settings on the source dataset.

Reference document:

https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance#copying-tabular-data

I hope this helps. Please let me know if you have any further questions.

Share via

Skip incompatible rows (Fault tolerance) doesn't work

1 answer

Your answer