question

lohani-4621 avatar image
0 Votes"
lohani-4621 asked preetijaiswal-0638 published

structure in getMetadata activity for csv file dataset show string datatypes for integer column in azure data factory and for json files structure is not supported

I want to do validation as first step before proceeding further in pipeline execute. I am fetching metadata activity for my dataset and then checking it against a predefined schema in if condition.
Metadata for csv files show column type string even for integer which is breaking the validation.
For json files, structure field is not supported in metadata activity

I have a use case, where i need to validate the dataset(both column count and column types) against a predefined schema before going further in pipeline execution and my dataset will be json or csv files.

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ChiragMishra-MSFT avatar image
0 Votes"
ChiragMishra-MSFT answered preetijaiswal-0638 published

Hi Pankaj,

As answered on your Stack Overflow thread, mapping of data types is taken care of in the copy activity. Using getMetadata on a csv file will give all strings.

Here's a post that talks about validation of datasets in ADF.

Please note that it's note an official Microsoft blog.

Hope this helps.

Stay safe! Take care!


· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I followed the same thread but the problem comes for csv files as all column type come as string , so i cant validate column types.

For example my csv file look like this
id,fname,lname
1,virat,kohli
2,sachin,tendulkar
3,rahul,dravid

my predefined schema is like this
createArray(

     json(‘{“name”: “id”, “type”: “Integer”}’),

     json(‘{“name”: “fname, “type”: “String”}’

     json(‘{“name”: “lname, “type”: “String”}’

)


Now above csv files validation fails as it takes string type for all column name.
In the getMetadata Activity docs - https://docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity

there is an example for which it is showing integer datatype for one of columns of csv file
11741-download.png

So, wants to know is it possible to do it by any way.




0 Votes 0 ·
download.png (143.2 KiB)

Hello Pankaj ,

As called out before by Chirag and also on the SO forum , the data in CSV are all treated as string , copy and other activity does give you the option to change data types in the mapping tab .

The example which you have pasted is for reading the metadata of a file and "not content of a file" . For reading the metadata of file , the size property is defined as int and so you see that way . feel free to let me know if you have any other query .

https://docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity#get-a-files-metadata

12629-capture.png


0 Votes 0 ·
capture.png (15.8 KiB)

Hello @ChiragMishra-MSFT @HimanshuSinha-MSFT @lohani-4621 ,
I am stuck in the similar scenario where i am dynamically migrating csv files from on-prem file server (using self IR) to sql server using adf. I have implemented using getmetadata>forEach>>Copy activity. It migrated the data dynamically using parameters but the datatypes of all columns are mapped as string -> nvarchar . When i checked the sql server all Table columns are nvarchar (max) only. I want to have the correct datatype in target table when i am doing this dynamically.
So is it possible if yes kindly help me out how to do that.

0 Votes 0 ·