Comparing Methods for Accessing ADLS Gen2 in Azure Synapse Analytics with Apache Spark

Question

Comparing Methods for Accessing ADLS Gen2 in Azure Synapse Analytics with Apache Spark

Pravalika-randstad 240

The Azure Synapse Analytics documentation outlines two approaches for reading/writing data to Azure Data Lake Storage Gen2 via an Apache Spark pool in Synapse Analytics.

Directly reading files using the ADLS store path:

pythonCopy code
adls_path =

Creating a mount point using mssparkutils and reading files using the synfs path:

pythonCopy code
mssparkutils.fs.mount(

What distinguishes these methods? And when would you opt for using a mount point?The Azure Synapse Analytics documentation outlines two approaches for reading/writing data to Azure Data Lake Storage Gen2 via an Apache Spark pool in Synapse Analytics.

3.Directly reading files using the ADLS store path:

pythonCopy code
adls_path =

Creating a mount point using mssparkutils and reading files using the synfs path:

pythonCopy code
mssparkutils.fs.mount(

What distinguishes these methods? And when would you opt for using a mount point?

phemanth 15,755 Reputation points Microsoft External Staff Moderator

2024-04-16T09:14:02.0533333+00:00

@Pravalika-randstad Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

phemanth 15,755 Reputation points Microsoft External Staff Moderator

2024-04-16T09:14:02.0533333+00:00

@Pravalika-randstad Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

@Pravalika-randstad

Thanks for reaching out to Microsoft Q&A

The two methods for accessing data in Azure Data Lake Storage Gen2 (ADLS Gen2) from an Apache Spark pool in Azure Synapse Analytics offer different advantages:

1. Directly Reading Files using ADLS Store Path:

Simpler Code: This approach requires specifying the complete ADLS Gen2 storage path for each file you want to access. The code is easier to write, especially for working with a small number of files.
Direct Path Referencing: You directly reference the data location within ADLS Gen2, which can be helpful for clarity in some cases.

2. Creating a Mount Point with mssparkutils:

Local File System Experience: This method mounts the ADLS Gen2 storage as a local file system within your Spark environment. You can then access files using standard file system APIs as if they were stored locally. This simplifies code by allowing you to treat the data like local files.
Improved Organization: Mount points provide a centralized location to access your data, improving organization, especially when working with a large number of files or complex datasets.
Easier Navigation: You can navigate the mounted data using familiar file system commands like listing directories and moving between folders.

Choosing the Right Method:

Use the ADLS store path for: Simple tasks involving a small number of files where clarity of the data location is important.
Use a mount point for: Scenarios involving a large number of files, complex data structures, or situations where you want to leverage the benefits of a local file system experience within your Spark code.

Here's a table summarizing the key points:


Feature	ADLS Store Path	Mount Point (mssparkutils)
Code Complexity	Simpler	More complex (mounting required)
Data Referencing	Direct path referencing	Local file system like path
File System Experience	No	Local file system experience
Organization	Less organized	More organized
Use Cases	Small number of files	Large datasets, complex structures

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Comparing Methods for Accessing ADLS Gen2 in Azure Synapse Analytics with Apache Spark

1 answer

Your answer