How to remove escape character '//' in the output file of the JSON sink dataset

Kien Nguyen Mai Tuan 0 Reputation points
2024-11-19T10:23:15.6066667+00:00

Good day community,

I'm seeking your help in removing the escape character \ from the output of the dataflow with JSON as the sink dataset. The objective is to produce a JSON output file with a row containing the value: ["all.be"]

In the Derived column activity, if I create a new column called Code and in the Expression of the column, I would put the value as ["all.be"]. This results in the JSON output file is "["all.es"]" even though when I turned the debug mode on, the output would be just ["all.be"]. I have tried adding the replace function in the expression to replace '\' to ''. but there is no escape character \ to be replaced until the actual writing of output into a JSON file

Could you let me know what I can do to remove the escaped character and if possible how to remove the quoted character" " outside of brackets [] please?

Thank you and much appreciated for your time looking into the problem.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,990 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 27,051 Reputation points
    2024-11-19T22:59:17.2+00:00
    • The escape character \ is automatically added when writing a JSON array to a file as a string because JSON strings require special characters (like " or \) to be escaped.
    • The debug output does not show the escape character because it's not writing the data to a file at that point, just evaluating it in memory.
    • The output you see, like "["all.be"]", is because ADF is treating the array ["all.be"] as a string and then escaping it for JSON serialization.

    Option 1: Use a Mapping Data Flow to Format the Output Properly

    1. Create a Derived Column:
      • Ensure the value you are adding (e.g., ["all.be"]) is not treated as a string but as a JSON array. Instead of directly typing ["all.be"], use an expression that generates an array.
      Example:
      
         byName(["all.be"])
      
      
      This will create a JSON array without treating it as a string.
    2. Set the Sink Settings:
      • In the Sink transformation, configure the output file's File Pattern to write as JSON objects, not as strings.
      • Choose Single Document or Per Line Document (depending on your requirement) in the JSON settings.
    3. Write to the Output File:
      • When writing, ADF will serialize the array directly, avoiding unnecessary escape characters or extra quotes.

    Option 2: Use a Post-Processing Script

    If you still see escape characters in the output JSON file, you can use a post-processing script in Azure Functions, Azure Logic Apps, or a Script activity in ADF to clean up the file.

    For example:

    • Write a small Python script in a Script activity:
      
        import json
      
        # Read the file
      
        with open('output.json', 'r') as file:
      
            data = file.read()
      
        # Replace escape characters and clean up quotes
      
        clean_data = json.loads(data)
      
        with open('clean_output.json', 'w') as file:
      
            json.dump(clean_data, file, indent=4)
      
      

    Option 3: Modify Sink Dataset JSON Settings

    1. Ensure the Sink dataset in ADF is set to write arrays as native JSON types.
    2. Configure the following:
      • File format: JSON
      • Serialization settings: Make sure "Write as array" is selected (if available).
      • Compression: None (if you're debugging and want to inspect the raw output).

    Option 4: Use Replace Logic in Data Flow Expression

    You can try the replace function to preprocess the string before writing it. For instance:

    
    replace(byName(["all.be"]), '\\', '')
    
    
    

    Option 5: JSON Flatten Tool (Advanced Data Processing)

    If you want finer control over how your JSON is written (such as removing outer quotes), consider using a Python/Databricks notebook or other external processing tools.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.