How do I remove the row in Excel file? - Azure Synapse Analytics

Kakehi Shunya (筧 隼弥) 201 Reputation points
2022-08-04T08:58:04.147+00:00

Hi, I wanna remove the row in Excel file in Azure Synapse Analytics.
As shown in the attached image, the source Excel file has an unnecessary second row.
I tried to remove it in ADF, but I don't know what condition to set in the component "Alter Row".
Do you know any solution?
Any help would be appreciated.
228026-image.png
228039-image.png

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,355 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,421 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,642 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 31,151 Reputation points Microsoft Employee
    2022-08-05T16:23:59.533+00:00

    Hi @Kakehi Shunya (筧 隼弥) ,
    Thankyou for sharing more details on your requirement. Additionally , you can try following approach to achive the requirement:

    1. Create the dataset pointing to your excel, make sure to import the schema and check first row as header.
    2. In the source transformation of the dataflow, select the above dataset and preview the data.

    228420-image.png

    3. Use surrogate key transformation to generate column say RowNum which would add an incrementing key value to each row of data.

    228596-image.png

    4. Use aggregate transformation to find out the maximum RowNum by using the expression : max(RowNum) in aggregate tab. As group by tab is optional, you can skip that.

    228651-image.png

    228605-image.png

    5. Use Sink transformation and select cache as the sink type to write the data into spark cache that can be referenced in other transformations.

    228612-image.png

    6. Add another source transformation and refer to the same source dataset.

    228643-image.png

    7. Use Surrogate key again to generate the RowNum again as done in step 3

    8. Use Filter transformation and use this expression : toInteger(RowNum) < toInteger(sink1#outputs()[1].MaxRow) - 1

    228606-image.png

    228645-image.png

    9. Use sink transformation to load this data to the target database.

    You can use the similar approach to filter out the first row as well by getting the min(RowNum) and using expression in the filter transformation : toInteger(RowNum) > toInteger(sink1#outputs()[1].MinRow)

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. Suba Balaji 11,186 Reputation points
    2022-08-05T10:57:39.043+00:00

    Hi @Kakehi Shunya (筧 隼弥) ,

    If you want to filter out such rows which don't bring any useful data, you can use filter transformation in data flow.

    You can mention expression like below,

    length (column1)>1

    This will filter out those rows which bring one character in the column1.

    Try this out and let us know

    Thanks

    1 person found this answer helpful.

  2. AnnuKumari-MSFT 31,151 Reputation points Microsoft Employee
    2022-08-04T09:21:08.21+00:00

    Hi @Kakehi Shunya (筧 隼弥) ,
    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.
    As I understand your ask , you want to skip first row of the excel while processing it via Azure data factory. Please let me know if my understanding is incorrect .

    While creating the dataset, you can use the range in the excel dataset which will allow you to read the excel starting from a particular cell . You can provide the range value as A3 in the dataset.

    For more details, kindly check this article: Dataset properties in excel

    227986-image.png

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators