Spark pool does not reflect the contents of requirements.txt

Question

Spark pool does not reflect the contents of requirements.txt

Yuji Masaoka 56

I have checked the Microsoft Learn description below and am trying to add the Python libraries to the Spark pool.

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries

I created requirements.txt according to the format of pip freeze and installed it from the Synapse Workspace screen.

   pymongo==2.8.1  
   bson  
   randint  
   aenum==2.1.2  
   backports-abc==0.5  
   bson==0.5.10

However, it seems that the libraries does not apply to the Spark pool. (Spark pool has been restarted)

I'm checking with the code below to see if the libraries are installed in the Spark pool.

   import pip  
   for i in pip.get_installed_distributions(local_only=True):  
       print(i)

Please tell me why the modules are not installed.

PRADEEPCHEEKATLA 90,661 Reputation points Moderator

2020-09-24T10:11:19.683+00:00

Hello @Yuji Masaoka ,

Thanks for reporting this issue. I’m working with the product team and get back to you when I have more information.

Accepted answer

1 additional answer

Your answer

PRADEEPCHEEKATLA 90,661 Reputation points Moderator

2020-09-24T10:11:19.683+00:00

Hello @Yuji Masaoka ,

Thanks for reporting this issue. I’m working with the product team and get back to you when I have more information.

Answer 1

PRADEEPCHEEKATLA 90,661 Moderator

Hello @Yuji Masaoka ,

On inspecting the requirements.txt file, I see randit is not valid python library on PyPI repo.

I had modified the requirements.txt file as follows and it worked without any issue.

pymongo==2.8.1  
aenum==2.1.2  
backports-abc==0.5  
bson==0.5.10

Also updating that the format of pip freeze expects valid PyPi package name listed along with an exact version (https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries#requirements-format)

Hope this helps. Do let us know if you any further queries.

----------------------------------------------------------------------------------------

Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

Yuji Masaoka 56 Reputation points

2020-09-25T07:48:54.437+00:00

Hello, @PRADEEPCHEEKATLA ,

Thank you for your investigation.
I have confirmed that there is a problem with the requirements.txt I created.
However, when I re-upload requirements.txt with what you were told and try to run the above command, the Spark session is getting a startup error.

If the above command works fine in your environment, I may have something missing. Would you please check it?
Yuji Masaoka 56 Reputation points

2020-09-25T07:50:10.107+00:00
p.s.
To tell the truth, the content of this requirements.txt is from the following repository, but the content described in the repository is incorrect.

https://github.com/Azure-Samples/Synapse/blob/master/Notebooks/PySpark/Synapse%20Link%20for%20Cosmos%20DB%20samples/MongoDB/spark-notebooks/pyspark/01-CosmosDBSynapseMongoDB.ipynb

I would like to send an issue to the repository about this problem. -> I did.

https://github.com/Azure-Samples/Synapse/issues/55
Yuji Masaoka 56 Reputation points

2020-09-25T09:45:43.58+00:00

Hello @PRADEEPCHEEKATLA ,

When I checked it again, I confirmed that it worked without any problems.
Thank you.

Answer 2

Yang Jiayi 1

@Yuji Masaoka

I've tested it here and it works fine.
Upload the requirements.txt file and allow 20-30 minutes for it to take effect rather than running it in Spark Pool right away.

%%pyspark
import pip #needed to use the pip functions
for i in pip.get_installed_distributions(local_only=True):
print(i)

I was able to check the environment-defined library.

pymongo 2.8.1
aenum 2.1.2
backports-abc 0.5
bson 0.5.10

Share via

Spark pool does not reflect the contents of requirements.txt

1 additional answer

Your answer