How to convert JSON-type columns from a CSV into JSON and apply a flatten operation

Diego Andres Vasquez Ruiz 20 Reputation points
2024-09-16T15:03:13.74+00:00

Hi all,

A file is downloaded via a web service in CSV format, containing several columns. However, these columns are of JSON type.

introducir la descripción de la imagen aquí

Example CSV

"account_id,customer_id,type,version,cookies,attributes,segments,events,created_at,updated_at,first_id,anonymous

88244,"{'type': 'cookie', 'value': '152a3b1e5265d0263a6d601e2510'}",tag,,['152a3b1e5265d0263a6d601e2510'],"[{'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50', '2024-09-11T00:15:08'], 'attribute_name': 'bretana_site_url', 'attribute_value': 'https://www.bretana.com.co/lado-b'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50', '2024-09-11T00:15:08'], 'attribute_name': 'bretana_title', 'attribute_value': 'Bretaña Momentos'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50', '2024-09-11T00:15:08'], 'attribute_name': 'bretana_utm_campaign', 'attribute_value': 'cpc_fb ig_gaseosas_bretana_bretana_manhattan_lanzamiento_consideracion_trafico_7_2024_omd'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50', '2024-09-11T00:15:08'], 'attribute_name': 'bretana_utm_source', 'attribute_value': 'fb ig'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50', '2024-09-11T00:15:08'], 'attribute_name': 'bretana_utm_medium', 'attribute_value': 'cpc'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50', '2024-09-11T00:15:08'], 'attribute_name': 'bretana_utm_id', 'attribute_value': '120211830364080662'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50', '2024-09-11T00:15:08'], 'attribute_name': 'bretana_utm_term', 'attribute_value': 'intereses-remarketing'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50'], 'attribute_name': 'bretana_utm_content', 'attribute_value': 'video-ad-reel-story-video-08'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50', '2024-09-11T00:15:08'], 'attribute_name': 'bretana_previous_url', 'attribute_value': 'https://www.bretana.com.co/'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-07T14:29:14', '2024-09-08T10:15:50'], 'attribute_name': 'postobon_id', 'attribute_value': 'PI-152a3b1e5265d0263a6d601e2510|0_252'}, {'created_at': ['2024-09-07T14:29:19', '2024-09-08T10:15:50', '2024-09-11T00:15:08'], 'attribute_name': 'bretana_adult_value', 'attribute_value': 'true'}, {'created_at': ['2024-09-11T00:15:08'], 'attribute_name': 'bretana_utm_content', 'attribute_value': 'video-ad-reel-story-video-10'}, {'created_at': ['2024-09-11T00:15:08'], 'attribute_name': 'postobon_id', 'attribute_value': 'PI-152a3b1e5265d0263a6d601e2510|0_255'}]","{'everybuyer': [], 'income': [14], 'product': [], 'career': [], 'education': [8], 'os': [24], 'gender': [1], 'everyone': [47], 'social': [], 'marital': [11], 'interest': [], 'browser': [33], 'connection': [1325], 'brand': [], 'device': [39], 'age': [4]}","[{'name': 'bretana_adult', 'created_at': ['2024-09-07T14:29:19.162000+00:00'], 'event_id': 'BRETANA_ADULT', 'category': 'custom'}, {'name': 'bretana_page_view', 'created_at': ['2024-09-07T14:29:13.802000+00:00', '2024-09-11T00:15:07.653000+00:00'], 'event_id': 'BRETANA_PAGE_VIEW', 'category': 'custom'}]",2024-09-07T14:29:14,2024-09-11T00:15:08,,"

The goal is to read the ‘attributes’ column using a complex format configuration in the projection as follows: (created_at as string[], attribute_name as string, attribute_value as string).

However, the data preview results show null values for the structure.

A PARSE/ANALYZE process is performed, but the result remains the same.

The final desired outcome is to convert these JSON columns into a user-readable report table.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Amira Bedhiafi 41,111 Reputation points Volunteer Moderator
    2024-09-16T18:01:19.2333333+00:00
    • Use a Copy Data activity to load the CSV file from your web service or local storage into Azure Data Factory. You can store the file in a blob storage or a SQL database as the sink.

    In Data Flows, create a data source from your previously imported file (e.g., Blob storage or database table).

    In the data flow, for each JSON string column (like attributes, events...), use the Derived Column transformation to parse them using the parse function.

    
    parse(attributes, 'array<struct<created_at:array<string>, attribute_name:string, attribute_value:string>>')
    

    After parsing, use the Flatten transformation to expand arrays or nested objects into separate rows or columns.

    In the Flatten activity, specify the columns (attributes) that need to be flattened.

    • Unroll by the created_at field if it's an array.
    • Flatten the other fields like attribute_name and attribute_value.

    If you see null values, ensure that the JSON structure is correctly formatted and that your parse function matches the structure of the JSON data.

    After the flattening, output the data to a final sink (like Azure SQL Database, Blob Storage, or a reporting tool) where you can create the user-readable report.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.