can't parse json files stored as octet-stream

wayne 1 Reputation point

i am storing data in azure by using the data reader getstream method from sql server. This allows me to upload the stream to azure blob via c# without incurring memory issues. I can read the file which is a gz json file formatted as a list of json objects through data bricks. I am now trying to offload the effort off a data bricks cluster so that i can transform the files into parquet via data factory. The issue is data factory file parsers does not seem to like json files written as an octet-stream. It is throwing a "Malformed records are detected in schema inference. Parse Mode:" error message.

How can i get data factory to read these files as it seems like there is a difference in how data bricks read them and data factory does even though they both have spark backends.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,880 questions
{count} votes