Spark pool does not reflect the contents of requirements.txt

Yuji Masaoka 56 Reputation points
2020-09-23T19:51:02.64+00:00

I have checked the Microsoft Learn description below and am trying to add the Python libraries to the Spark pool.

I created requirements.txt according to the format of pip freeze and installed it from the Synapse Workspace screen.

   pymongo==2.8.1  
   bson  
   randint  
   aenum==2.1.2  
   backports-abc==0.5  
   bson==0.5.10  

However, it seems that the libraries does not apply to the Spark pool. (Spark pool has been restarted)

I'm checking with the code below to see if the libraries are installed in the Spark pool.

   import pip  
   for i in pip.get_installed_distributions(local_only=True):  
       print(i)  

Please tell me why the modules are not installed.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
2,621 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 53,276 Reputation points Microsoft Employee
    2020-09-25T03:06:55.413+00:00

    Hello @Yuji Masaoka ,

    On inspecting the requirements.txt file, I see randit is not valid python library on PyPI repo.

    I had modified the requirements.txt file as follows and it worked without any issue.

    pymongo==2.8.1  
    aenum==2.1.2  
    backports-abc==0.5  
    bson==0.5.10  
    

    Also updating that the format of pip freeze expects valid PyPi package name listed along with an exact version (https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries#requirements-format)

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.


1 additional answer

Sort by: Most helpful
  1. Yang Jiayi 1 Reputation point
    2020-09-25T11:45:56.717+00:00

    @Yuji Masaoka

    I've tested it here and it works fine.
    Upload the requirements.txt file and allow 20-30 minutes for it to take effect rather than running it in Spark Pool right away.

    %%pyspark
    import pip #needed to use the pip functions
    for i in pip.get_installed_distributions(local_only=True):
    print(i)

    I was able to check the environment-defined library.

    pymongo 2.8.1
    aenum 2.1.2
    backports-abc 0.5
    bson 0.5.10

    28309-0.jpg
    28353-1.jpg
    28346-2.jpg

    No comments