HDInsight data governance/lineage tool or framework

Federico Sardo 91 Reputation points
2023-07-18T20:41:31.2933333+00:00

Hi folks,

We are implementing a bigdata solution using HDInsight (Hive Interactive Query and Spark with an azure SQL db for hive metastore), is a requirement from the client to provide a data governance, linage, data masking solution.

Based on what I've researched, purview is not able to connect with HDInsight.

Do you know another approach in order to meet this requirement?

Regards

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
215 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA 90,241 Reputation points
    2023-07-19T10:31:00.2166667+00:00

    @Federico Sardo - Thanks for the question and using MS Q&A platform.

    Unfortunately, Azure HDInsight is not supported on Microsoft Purivew.

    Appreciate if you could share the feedback on our feedback channel. Which would be open for the user community to upvote & comment on. This allows our product teams to effectively prioritize your request against our existing feature backlog and gives insight into the potential impact of implementing the suggested feature.

    Alternative findings as per my research:
    If Purview is not an option for connecting with HDInsight, there are alternative approaches you can consider to meet the data governance, lineage, and data masking requirements in your HDInsight environment. Here are a few options:

    Apache Atlas: Apache Atlas is an open-source data governance and metadata framework that can be integrated with HDInsight. It provides features for data classification, lineage tracking, and metadata management. You can set up Apache Atlas on your HDInsight cluster and configure it to capture and manage metadata for your data assets.

    Custom Lineage Tracking: You can develop a custom solution for capturing lineage information within your HDInsight environment. This would involve instrumenting your data processing pipelines to record the lineage information as the data flows through various transformations. You can store this lineage information in a separate database or metadata repository for future reference.

    Third-Party Tools: There are several third-party data governance and lineage tools available in the market that can integrate with HDInsight. These tools often provide more advanced features and user-friendly interfaces for managing data governance. Some popular options include Collibra, Informatica Axon, and Alation. You can evaluate these tools based on your specific requirements and choose one that integrates well with HDInsight.

    Custom Data Masking: To implement data masking in HDInsight, you can develop custom scripts or use frameworks like Apache NiFi or Apache Ranger. These tools provide capabilities for masking sensitive data as it moves through your data pipelines. You can define rules and policies to determine how the data should be masked based on your client's requirements.

    Remember to thoroughly evaluate and test any solution you choose to ensure it meets your specific needs and aligns with your client's data governance requirements.

    Disclaimer: This response contains a reference to a third-party World Wide Web site. Microsoft is providing this information as a convenience to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. There are inherent dangers in the use of any software found on the Internet, and Microsoft cautions you to make sure that you completely understand the risk before retrieving any software from the Internet.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.