Spark pool requirements.txt install package from a repository manager other than PyPI

Anonymous
2021-07-05T16:35:50.383+00:00

Hello,

I am having some trouble installing packages in my Spark pool on Synapse.
My requirements.txt file looks something like this :
--extra-index-url <server_url>
<project_name>

The execution returns the following error : Invalid package name --extra-index-url
Why does the package installer interpret my option as a package name although I am following the official pip syntax ? Can you please help ?

Thank you for your help.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,253 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 75,286 Reputation points Microsoft Employee
    2021-07-06T05:52:55.457+00:00

    Hello @Anonymous ,

    Welcome to the Microsoft Q&A platform.

    Python packages can be installed from repositories like PyPI and Conda-Forge by providing an environment specification file.

    Environment specification formats:

    PIP requirements.txt

    A requirements txt file (output from the pip freeze command) can be used to upgrade the environment. When a pool is updated, the packages listed in this file are downloaded from PyPI. The full dependencies are then cached and saved for later reuse of the pool.

    The following snippet shows the format for the requirements file.

    absl-py==0.7.0  
    adal==1.2.1  
    alabaster==0.7.10  
    

    YML format (preview)

    In addition, you can also provide an environment.yml file to update the pool environment. The packages listed in this file are downloaded from the default Conda channels, Conda-Forge, and PyPI. You can specify other channels or remove the default channels by using the configuration options.

    This example specifies the channels and Conda/PyPI dependencies.

    name: stats2  
    channels:  
    - defaults  
    dependencies:  
    - bokeh  
    - numpy  
    - pip:  
      - matplotlib  
      - koalas==1.7.0  
    

    For more details, refer to Manage Python libraries for Apache Spark in Azure Synapse Analytics.

    Hope this helps. Do let us know if you any further queries.

    ---------------------------------------------------------------------------

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.


  2. - 51 Reputation points Microsoft Employee
    2022-11-11T18:02:59.92+00:00

    @PRADEEPCHEEKATLA-MSFT Is installing python packages from private repo is supported yet?