question

MicrosoftUser-1058 avatar image
0 Votes"
MicrosoftUser-1058 asked abbastiki commented

Spark pool requirements.txt install package from a repository manager other than PyPI

Hello,

I am having some trouble installing packages in my Spark pool on Synapse.
My requirements.txt file looks something like this :
--extra-index-url <server_url>
<project_name>


The execution returns the following error : Invalid package name --extra-index-url
Why does the package installer interpret my option as a package name although I am following the official pip syntax ? Can you please help ?

Thank you for your help.

azure-synapse-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

PRADEEPCHEEKATLA-MSFT avatar image
0 Votes"
PRADEEPCHEEKATLA-MSFT answered abbastiki commented

Hello @MicrosoftUser-1058,

Welcome to the Microsoft Q&A platform.

Python packages can be installed from repositories like PyPI and Conda-Forge by providing an environment specification file.

Environment specification formats:

PIP requirements.txt

A requirements txt file (output from the pip freeze command) can be used to upgrade the environment. When a pool is updated, the packages listed in this file are downloaded from PyPI. The full dependencies are then cached and saved for later reuse of the pool.

The following snippet shows the format for the requirements file.

 absl-py==0.7.0
 adal==1.2.1
 alabaster==0.7.10

YML format (preview)

In addition, you can also provide an environment.yml file to update the pool environment. The packages listed in this file are downloaded from the default Conda channels, Conda-Forge, and PyPI. You can specify other channels or remove the default channels by using the configuration options.

This example specifies the channels and Conda/PyPI dependencies.

 name: stats2
 channels:
 - defaults
 dependencies:
 - bokeh
 - numpy
 - pip:
   - matplotlib
   - koalas==1.7.0

For more details, refer to Manage Python libraries for Apache Spark in Azure Synapse Analytics.

Hope this helps. Do let us know if you any further queries.


Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @PRADEEPCHEEKATLA-MSFT

Thank you for your answer.
I am aware of the documentation Microsoft has provided on the subject. I actually want to install a package from a repository manager other than PyPI , which means that i have to specify options in my requirements.txt file.

Does the packages functionnality on Synapse Spark Pool allow me to install private packages from repo manager other than PyPI ? If so, does it follow the same structure as the official pip documentation ?

Here are some useful links on the subject :
documentation on how to install private packages : https://docs.readthedocs.io/en/stable/guides/private-python-packages.html)
pip documentation for requirements file format : https://pip.pypa.io/en/stable/cli/pip_install/#requirements-file-format



Thanks !

0 Votes 0 ·

Hello @MicrosoftUser-1058,

Apologize for the delay in response.

As I said in the above post, Python packages can be installed from repositories like PyPI and Conda-Forge by providing an environment specification file.

Unfortunately, installing packages from private repository in not supported in Azure Synapse Spark pool. And our PG is working on this feature and it will be available soon.

I will update this thread once it available. Stay tuned!

0 Votes 0 ·

Hello @MicrosoftUser-1058,

Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Up-Vote for the same. And, if you have any further query do let us know.

0 Votes 0 ·
Show more comments