Excel to CSV conversion has name extension issue

Amandeep Bajaj 26 Reputation points
2023-05-04T07:36:47.83+00:00

Hi All,

I am trying to copy data from two Excel files to CSV files using ADF and while I am able to copy them successfully, my issue is the extension of the files which are coming out as the abc.xlsx.csv

Source File:

Source-Files-SS

Outputfile-ss

Pipeline-Source-Dataset

Pipeline-Sink-Screenshot

Pipeline-Sink-Dataset

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,893 questions
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,432 Reputation points Microsoft Employee
    2023-05-04T16:04:55.9233333+00:00

    Hi @Amandeep Bajaj ,

    Welcome to Microsoft Q&A forum and thanks for reaching out here.

    As per my understanding, you are using "recursive": true to copy multiple Excel files to your sink but changing the file extension from .xlsx to .csv.

    Please note that when you use recursive feature in copy activity, copy activity will copy all the files as per your source dataset configuration as is and will append any file extension specified in the sink dataset configuration.

    This is the reason why you are seeing the original file name appended with .csv extension.

    If you do not want to have the extra .xlsx file extension in your file name, instead of using recursive feature, you can use the ForEach activity to loop through the list of files and use the "source" and "sink" properties of the Copy Data activity to specify the source and destination file paths. Before using the foreach activity you will have to use a GetMetadata activity to get the list of fileNames which you will be passing them to your source dataset and sink dataset dynamically for each iteration of the copy activity.

    Here are the steps to do this:

    1. First add a Get Metadata activity to your pipeline to retrieve the list of Excel files in the source folder. In the "Get Metadata" activity, specify the path to the folder containing the Excel files. For example, if your files are located in a folder called "input" in your Azure Storage account, you can specify the path as "input/". Make sure to select below Field list properties in your GetMetada activity configuration.
      User's image
    2. Next add a subsequent ForEach activity to your pipeline and configure it to loop through the list of Excel files retrieved by the "Get Metadata" activity. In the "ForEach" activity, specify the Items property as @activity('Get Metadata').output.childItems.
    3. Inside the "ForEach" activity, add a "Copy Data" activity to copy each Excel file to a CSV file with the same name. In the "Copy Data" activity, specify the "source file name" property as item().name to specify the source Excel file. Specify the "sink file name" property as @replace(item().name, '.xlsx', '.csv') to specify the destination CSV file name. This expression will replace the ".xlsx" extension with ".csv" in the output file name.

    The output CSV files will have the correct file extension of .csv instead of .xlsx.csv. The item() function is used here to refer to the current file being processed by the "ForEach" activity.

    Hope this info helps. Please do let us know if you have further questions.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.


0 additional answers

Sort by: Most helpful