Column truncate issue - Copy file to DB ADF pipeline

ADF_Coder 0 Reputation points
2024-07-19T17:52:29.9766667+00:00

Hi All,

I have a pipeline where I am copying data from a file to a database. In the file, I have a few columns with extra spaces that I need to trim or truncate. I am not getting any option to use any transformation or function (dynamic content) on the destination side.

I have to do this in the ADF copy activity only.

 

Please help and suggest.

User's image

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,477 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Amira Bedhiafi 22,306 Reputation points
    2024-07-19T19:06:10.1566667+00:00

    Instead of directly using a Copy activity, where you add a Derived Column transformation to trim the extra spaces from the columns. For each column that needs trimming, add a new derived column with the following expression:

    
    trim(columnName)
    
    

    For example, if you have a column named CustomerName, you would use:

    
    trim(CustomerName)
    
    

    Add a Sink transformation and configure it to write to your Azure SQL Database.


  2. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

  3. Chandra Boorla 1,335 Reputation points Microsoft Vendor
    2024-07-22T17:22:43.03+00:00

    Hi @ADF_Coder

    Thanks for the question and using MS Q&A platform.To remove extra spaces from a file during copy to a database in ADF, leverage a Stored Procedure Activity. This procedure trims spaces from designated columns after the copy operation. Below are the steps:

    Step 1: Create a Staging Table, this table will temporarily hold the data copied from your file. Example SQL for Creating a Staging Table:

    CREATE TABLE StagingTable (
        Column1 VARCHAR(255),
        Column2 VARCHAR(255),
        Column3 VARCHAR(255)
    );
    

    Step 2: Create the Final Table, this table will hold the cleaned data after trimming spaces. Example SQL for Creating the Final Table:

    CREATE TABLE FinalTable ( 
    Column1 VARCHAR(255), 
    Column2 VARCHAR(255), 
    Column3 VARCHAR(255) 
    ); 
    

    Step 3: Create a Stored Procedure, this stored procedure will trim spaces from the columns in the staging table and insert the cleaned data into the final table.

    Example SQL for Creating the Stored Procedure:

    CREATE PROCEDURE TrimColumns
    AS
    BEGIN
        -- Using LTRIM and RTRIM for leading and trailing spaces
        INSERT INTO FinalTable (`Column1`, `Column2`, `Column3`)
        SELECT TRIM(Column1) AS trimmed_column1, TRIM(Column2) AS trimmed_column2, TRIM(Column3) AS trimmed_column3
        FROM StagingTable;
        -- Optionally, clean up the staging table if needed
        DELETE FROM StagingTable;
    END;
    

    For additional info, please refer: https://learn.microsoft.com/en-us/sql/t-sql/functions/trim-transact-sql?view=sql-server-ver16

    Step 4: Configure ADF Pipeline using copy data activity.

    • Source - Configure your source dataset to point to the file you want to copy data from.
    • Sink - Configure your sink dataset to point to the StagingTable in your database.

    Add a Stored Procedure Activity Drag a "Stored Procedure" activity onto the pipeline canvas and connect it to the "Copy Data" activity.

    • Linked Service: Choose the linked service that connects to your SQL Database.
    • Stored Procedure Name: Select the stored procedure TrimColumns that you created earlier.

    Connect the Activities Ensure the Copy Data activity is connected to the Stored Procedure activity. This ensures that once the data is copied to the staging table, the stored procedure is executed to clean the data.

    Example Pipeline Flow:

    • Copy Data Activity: Copies data from the file to the StagingTable.
    • Stored Procedure Activity: Executes the TrimColumns stored procedure to trim spaces and move data to the FinalTable.

    Step 5: Validate and Run the Pipeline.

    I hope this information helps, please do let us know if you have any further queries.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.