question

MaheshKumar-1184 avatar image
1 Vote"
MaheshKumar-1184 asked JonnyM-0584 answered

Azure Data Factory -- Skipping 2nd & 3rd rows from the source file

Hello All,

I have a source .csv file starting with a header followed by a couple of rows containing metadata which I don't need to copy to my Azure SQL table. I would want to start processing the data from the header, skip the 1st and 2nd rows, then continue with the 3rd row, 4th row etc..
I tried using the property Skip Line Count, but it is throwing up errors, probably because of the following characteristics.

89762-image.png



Is there a way I can skip the 2nd and 3rd rows while reading the file or a post-copy script kind of option ?

Thank you.

Regards,
Mahesh

azure-data-factory
image.png (198.1 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

KranthiPakala-MSFT avatar image
1 Vote"
KranthiPakala-MSFT answered MattEvans-7276 commented

Hi @MaheshKumar-1184 ,

Thanks for reaching out.

Since your row1 is header and you want to skip 2nd and 3rd rows, you can utilize the skipLineCount property available in Copy activity source settings.

  1. First you will have to select firstRowAsHeader property in dataset connection settings as shown below:

    89806-image.png

  2. Second, in the copy activity source settings, set skipLineCount = 2 this will skip the first 2 rows (in your case 2nd and 3rd row data excluding the header row i.e., 1st row as you have selected firstRowAsHeader property in dataset connection settings)

    89824-image.png

  3. In the mapping section do import schemas.
    89816-image.png

I have tested this scenario and works as expected.

Here is the source blob used:

89696-image.png


Here is the sink table after loading data:

89797-image.png


Ref doc: Copy activity properties

Hope this info helps. Do let us know how it goes.



Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.



image.png (46.2 KiB)
image.png (34.4 KiB)
image.png (26.0 KiB)
image.png (20.4 KiB)
image.png (20.6 KiB)
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


Hi @MaheshKumar-1184 ,

Just checking in to see if the above suggestion was helpful. If this answers your query, please do click “Accept Answer” and/or Up-Vote, as it might be beneficial to other community members reading this thread. And, if you have any further query do let us know.

0 Votes 0 ·

Hi @MaheshKumar-1184 ,

We still have not heard back from you. Just wanted to check if the above suggestion was helpful? If it answers your query, please do click “Accept Answer” and/or Up-Vote, as it might be beneficial to other community members reading this thread. And, if you have any further query do let us know.

0 Votes 0 ·

This solution does not work, as shown in your screenshots.
The order of execution is the wrong way around.
It skips first the 2 rows, and subsequently uses the 3rd row as header.
In this case 2, Rqel, Stone as header values. This will fail with another input.

0 Votes 0 ·

What you want to do is not possible with current fucntionality of the copy task.

You would have to use a dataflow instead of copy task. Use the First Row as Header on the dataset so that the correct column headers are obtained but then use a filter task to remove the first 2 rows ie ID >2. If the values of ID's wont always be the same then you can add a row number to the dataset as a derived column and then filter the dataset using that column instead

0 Votes 0 ·
JonnyM-0584 avatar image
0 Votes"
JonnyM-0584 answered

What if "Skip Line Count" is dynamic? For instance, for one CSV the header line is on line 3 but in the next CSV it's on line 5 with the only consistent value being the header itself? so instead of tell ADF to skip to line 3 or line 5 you can tell it to skip to the line that startswith "ID"?

Thanks

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.