Can't access files and python libraries on Azure Machine Learning user's workspace with Serverless spark pool or synapse spark pool

Nithin Gowrav 0 Reputation points
2024-01-04T02:19:08.55+00:00

I have a set of utility modules and some configs in my workspace that I was able to access when using a personal compute in my notebook. But when using a Serverless spark compute or synapse spark pool as compute, I'm not able to access them. Followed the steps given in this link but did not workout - https://learn.microsoft.com/en-us/azure/machine-learning/interactive-data-wrangling-with-apache-spark-azure-ml?view=azureml-api-2#accessing-data-on-the-default-file-share

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,333 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Konstantinos Passadis 19,586 Reputation points MVP
    2024-01-04T04:19:50.9033333+00:00

    Hello @Nithin Gowrav !

    Welcome to Microsoft QnA!

    To access your utility modules and configs in your workspace when using a Serverless spark compute or synapse spark pool as compute, you can manage Spark pool level libraries for Apache Spark. You can install or remove them into a Spark pool and they will be available to all notebooks and jobs running on the pool. There are two primary ways to install a library on a Spark pool:

    Install a workspace library that has been uploaded as a workspace package.
    
    For updating Python libraries, provide a requirements.txt or Conda environment.yml environment specification to install packages from repositories like PyPI, Conda-Forge, and more.
    

    You can read more about managing Spark pool level libraries for Apache Spark in Azure Synapse Analytics in this link.

    https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages


    I hope this helps!

    The answer or portions of it may have been assisted by AI Source: Microsoft CoPilot

    Kindly mark the answer as Accepted and Upvote in case it helped!

    Regards

    0 comments No comments

  2. Konstantinos Passadis 19,586 Reputation points MVP
    2024-01-05T23:47:03.9666667+00:00

    Hello @Nithin Gowrav !

    Can you kindly verify that you have read the following ;

    https://learn.microsoft.com/en-us/azure/machine-learning/interactive-data-wrangling-with-apache-spark-azure-ml?view=azureml-api-2

    If you are trying to access the default File Share you can do it woth the code provided by @VasaviLankipalle-MSFT

    Otherwise for Azure Storage :

    The Azure Machine Learning datastores can access data using Azure storage account credentials

    • access key
    • SAS token
    • service principal

    or provide credential-less data access. Depending on the datastore type and the underlying Azure storage account type, select an appropriate authentication mechanism to ensure data access. This table summarizes the authentication mechanisms to access data in the Azure Machine Learning datastores:

    Expand table

    Azure BlobNoAccess key or SAS tokenNo role assignments neededAzure BlobYesUser identity passthrough<sup>*****</sup>User identity should have appropriate role assignments in the Azure Blob storage accountAzure Data Lake Storage (ADLS) Gen 2NoService principalService principal should have appropriate role assignments in the Azure Data Lake Storage (ADLS) Gen 2 storage accountAzure Data Lake Storage (ADLS) Gen 2YesUser identity passthroughUser identity should have appropriate role assignments in the Azure Data Lake Storage (ADLS) Gen 2 storage accountIs there an Error Message you can share ?


    I hope this helps!

    Kindly mark the answer as Accepted and Upvote in case it helped!

    Regards

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.