Mapping Dataflow Flatten JSON array with no projection

Daniel Ambler 0 Reputation points
2023-01-13T09:36:20.9+00:00

Hi,

I'm currently hitting an issue in with a Mapping Dataflow in Synapse pipelines when attempting a flatten transform without a schema projection. Not sure if its me that's missing something here or if i'm expecting the Mapping Data Flow to do something that's not possible without some additional steps.

I have a file with a structure similar to the attached JSON file (entity1_json.txt)- the difference is the schema of the items array will change with each file.

[
  {
    "meta": {
      "code": 200,
      "message": "OK",
      "serverTime": "2023-01-12T15:43:56+00:00",
      "userTimezone": { "offset": "+00:00", "name": "Europe/London" }
    },
    "response": {
      "count": 4,
      "items": [
        {
          "col1": "value1",
          "col2": "value2"
        },
        {
          "col1": "value1",
          "col2": "value2",
          "col3": "value3"
        }
      ]
    }
  },
  {
    "meta": {
      "code": 200,
      "message": "OK",
      "serverTime": "2023-01-12T15:43:56+00:00",
      "userTimezone": { "offset": "+00:00", "name": "Europe/London" }
    },
    "response": {
      "count": 4,
      "items": [
        {
          "col1": "value1",
          "col2": "value2",
          "col3": "value3"
        },
        {
          "col1": "value1",
          "col2": "value2",
          "col3": "value3",
          "col4": "value4"
        }
      ]
    }
  },
  {
    "meta": {
      "code": 200,
      "message": "OK",
      "serverTime": "2023-01-12T15:43:56+00:00",
      "userTimezone": { "offset": "+00:00", "name": "Europe/London" }
    },
    "response": { "count": 346, "items": [] }
  }
]

With a schema projection everything is great, however this means I need to maintain one Dataflow per entity. What I was hoping to do was maintain one Dataflow and parameterise it with an entity name.

So the typical flow isUser's image

And the colum in the schema is response.items

However if I break it all down to the source and flatten, and clear the projection, I cant quite get the same results

  1. Clear the projection, Schema Drift is enabled in the Schema options

User's image

  1. Alter the flatten to use response.items (first thing I tried, seemed sensible at the time :-))

User's image

My assumption at this point is this wont work because the flow is not schema aware, to this point I have tried various routes such as

  • Adding a select inbetween the source and flatten to ensure im only dealing with the response
  • Adding a derived column to get the items array (all show as NULL in the preview)
  • Working on a parse step, but this seems I would still need to specify the schema of the items array, which is not what i'm wanting to do.

I'd appreciate any assistance while I continue to look for a solution.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,696 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,199 questions
{count} votes