Why does pipeline give varying row order results with the same data file?

Tyrone Jones 0

I have updated one of our pipelines in order to incorporate row ordering as recommended here:
https://learn.microsoft.com/en-us/answers/questions/419209/additional-column-that-records-the-row-number-auto

The goal being to track the order of the data in the input file as that affects downstream processes.

It is reading from a file, making minor transformations, and inserting into an azure sql database.

User's image

All seems well when tested, data looks fine in debug.

Differences were reported when the pipeline was run in a different session however.

So I was able to reproduce the discrepancy, by using debug / data preview with two different integration runtimes each in turn (the auto resolve one, and then our "real" one) the real one yields different results.

In other words the row order produced was different, with the same row number attached to a different data row from the input file.

Why is this ? How can we ensure consistent output ?
Thanks

BhargavaGunnam-MSFT 26,496 Reputation points Microsoft Employee

2023-03-01T23:14:42.05+00:00

Hello Tyrone Jones,

Welcome to the MS Q&A platform.

Please correct me if my understanding is wrong.

You see different results when you run different integration run times in the dataflow.

Are you using multiple sinks here?

Can you provide us with the sample data and the transformation details for further troubleshooting the issue?
BhargavaGunnam-MSFT 26,496 Reputation points Microsoft Employee

2023-03-03T00:29:32.59+00:00

Hello Tyrone Jones,

I am checking to see if you got a chance to look into my above response. Please let me know if you have any further questions.
Tyrone Jones 0 Reputation points

2023-03-03T22:04:51.33+00:00

We are using a single sink. A single table in an Azure database.

I can provide an example data format/file but likely not the file itself as it is about 7G
Tyrone Jones 0 Reputation points

2023-03-06T17:51:30.4833333+00:00

Providing the follow up info. Attached is a 1000 record sample file. Also attached is the data flow script file. That should have all the information related to transformation. Not much there just column to column mapping after removing whitespace and such.
Again the problem is that the "FileRowNumber" created at the end will not be consistent. e.g. the first data row from the file will end up with a different filerownumber in different runs. The only difference I can point to is the Integration runtime, which is why I mentioned it. Not sure what is happening. I'm also including info on the "live" runtime in case that information is helpful. Seems like some file splitting or partitioning is happening behind the scenes somewhere, if it is not just a bug/software issue. I'll also work on large file next in case its a large file issue.

DataFlowscript.txt claimtest.txt
BhargavaGunnam-MSFT 26,496 Reputation points Microsoft Employee

2023-03-06T17:53:46.98+00:00

Hello Tyrone Jones,

Sorry, I didn't get you. Can you please provide more details about the "7G" you mentioned?
Tyrone Jones 0 Reputation points

2023-03-06T21:57:40.7966667+00:00

That is the size of the actual files we are processing. 7 gigabytes. Or greater
Tyrone Jones 0 Reputation points

2023-03-08T19:47:23.5133333+00:00

In case these help:
These are runtime details for the runtime:

Also if the large version of the file is needed or useful , let me know & how to send. Compressed it is 700M
BhargavaGunnam-MSFT 26,496 Reputation points Microsoft Employee

2023-03-09T00:34:00.8166667+00:00

Hello Tyrone Jones,

Thank you for the details. I will look into this further and get back to you with more details.
BhargavaGunnam-MSFT 26,496 Reputation points Microsoft Employee

2023-03-13T23:24:18.18+00:00

Hello Tyrone Jones,

I tried to reproduce the issue from my end with the sample file you provided earlier. But the FileRowNumber is consistent when writing it to my sink dataset.

I would suggest opening a support case for a deeper investigation. If you don't have a support plan, please let me know. I can provide you one-time free support request.

I am looking forward to hearing from you.