Share via

datachecks

arkiboys 9,711 Reputation points
2022-04-19T15:35:38.607+00:00

I have now loaded the data into storage account gen2
What kind of checks can I do to ensure the validity of the loaded data?
Is there a list of standard checks to check and for me to go through?

Thank you

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.

0 comments No comments

Answer accepted by question author

  1. Pratik Somaiya 4,211 Reputation points Volunteer Moderator
    2022-04-20T04:40:42.213+00:00

    Hello @arkiboys

    You can do a data validation by following steps:

    1) Create External Table in DW and check if the datatypes are matching or to check if there are any NULLs in Non Nullable fields or to check the count of data etc. This check would ensure the data in the data lake is as per requirement

    You can find a detail on Step 1 in this article: https://www.soais.com/adls-data-validation/

    2) You can leverage ADF to perform these data checks wherein you can get the file's Metadata and then apply conditions to check if the validations pass or fail, detailed steps can be found here: https://www.altisconsulting.com/au/insights/how-to-validate-data-lake-files-using-azure-data-factory/

    Was this answer helpful?

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Nandan Hegde 36,881 Reputation points MVP Volunteer Moderator
    2022-04-20T05:32:09.117+00:00

    Hey,

    You can use Assert data transformations for data validations . The below link explains multiple scenarios for data validations :
    youtube.com/watch?v=_NzWpTRxt0s

    You can also use the open source tool: Great expectations and call it via Azure functions or Batch job or databricks via ADF. The below link should help :

    https://greatexpectations.io/

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.