Express route and Azure Data Lake gen 2

Anonymous
2020-06-02T06:10:02.287+00:00

Hi,

I am doing some testing around accessing Azure Data Lake Gen 2 from an On Premise hadhoop
I need to access the service via Express Route
What is the recommended setup if anyone has been through a configuration?

thanks

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,335 questions
Azure ExpressRoute
Azure ExpressRoute
An Azure service that provides private connections between Azure datacenters and infrastructure, either on premises or in a colocation environment.
321 questions
{count} votes

Accepted answer
  1. Anonymous
    2020-06-16T21:44:31.227+00:00

    Using Private End Point on a VNet / Subnet seems to be the right solution in order to connect simple from an On Premise setup of WANDisco to the Azure Data Lake Service Gen 2 storage account.

    the key element is to modify the hosts file or DNS entry in order to use the Private IP to resolve the ADLS hostname

    https://learn.microsoft.com/en-us/azure/storage/common/storage-private-endpoints

    20916-image.png

    Implementation steps can be found here


1 additional answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 76,511 Reputation points Microsoft Employee
    2020-06-03T14:44:50.813+00:00

    @Anonymous Welcome to the Microsoft Q&A platform.

    These are different sources of data and the different ways in which that data can be ingested into a Data Lake Storage Gen2 account.

    9024-adls-data-ingest.png

    Data stored in on-premises or IaaS Hadoop clusters:

    Large amounts of data may be stored in existing Hadoop clusters, locally on machines using HDFS. The Hadoop clusters may be in an on-premises deployment or may be within an IaaS cluster on Azure. There could be requirements to copy such data to Azure Data Lake Storage Gen2 for a one-off approach or in a recurring fashion. There are various options that you can use to achieve this. Below is a list of alternatives and the associated trade-offs.

    8979-adls-gen2-onpremise.jpg

    Really large datasets:

    For uploading datasets that range in several terabytes, using the methods described above can sometimes be slow and costly. In such cases, you can use Azure ExpressRoute.

    Azure ExpressRoute lets you create private connections between Azure data centers and infrastructure on your premises. This provides a reliable option for transferring large amounts of data. To learn more, see Azure ExpressRoute documentation.

    Reference: Migrate data from on-premise Hadoop to Azure Storage and Using Azure Data Lake Storage Gen2 for big data requirements

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.