Using specific packages in Synapse spark pool...

Matty-2070 11 Reputation points
2022-10-07T11:30:44.98+00:00

Hi Team,

Some of the default packages present in a Synapse spark pool 'out of the box' are not up to date enough for my needs, so I want to be able to download specific versions.

I have:

  1. Created a new spark pool
  2. Downloaded the specific *.whl file I want to use from pypi.org
  3. Uploaded it to the 'workspace packages' area of Synapse
  4. Attempted to install the package to the new spark pool via the 'select from workspace packages' section

But every attempt I have made to do this has resulted in a 'failed to apply settings' error, citing a message similar to:

 ...  
 INFO Running /usr/lib/miniforge3/bin/conda env update -p /home/trusted-service-user/cluster-env/clonedenv --file /usr/lib/library-manager/bin/lmjob/sparkpoolcustom/package_cleaned_environment.yml","Pip subprocess error:","ERROR: numpy-1.23.3-cp38-cp38-win32.whl is not a supported wheel on this platform.","","","CondaEnvException: Pip failed","","22/10/07 11:06:25 ERROR b\"Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you.\\nCollecting package metadata (repodata.json): ...working... done\\nSolving environment: ...working... done\\nPreparing transaction: ...working... done\\nVerifying transaction: ...working... done\\nExecuting transaction: ...working... done\\nInstalling pip dependencies: ...working... Ran pip subprocess with arguments:\\n['/home/trusted-service-user/cluster-env/clonedenv/bin/python', '-m', 'pip', 'install', '-U', '-r', '/usr/lib/library-manager/bin/lmjob/sparkpoolcustom/condaenv.t1h1b1b3.requirements.txt']\\nPip subprocess output:\\n\\nfailed\\n\"","22/10/07 11:06:25 INFO Cleanup following folders and files from staging directory:","22/10/07 11:06:29 INFO Staging directory cleaned up successfully"],"registeredSources":null}  

It seems at though the version of Python running on Synapse is 3.8.10, hence I downloaded the cp38 version of the package, but no joy. Can anyone shed any light as to what might be wrong?

Thanks,

Matty

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} vote

3 answers

Sort by: Most helpful
  1. Matty-2070 11 Reputation points
    2022-10-07T13:42:27.497+00:00

    In addition to the above, I have also tried to add a package via the 'Requirements files' option, using a Requirements.txt file. This has also failed. Message as follows:

    ProxyLivyApiAsyncError  
    LibraryManagement - Spark Job for sparkpoolcustom in workspace **** in subscription **** failed with status:  
    {"id":9,"appId":"application_****","appInfo":{"driverLogUrl":"http://vm-****/node/containerlogs/container_****/trusted-service-user","sparkUiUrl":"http://vm-****/proxy/application_****/","isSessionTimedOut":null,"isStreamingQueryExists":"false","impulseErrorCode":null,"impulseTsg":null,"impulseClassification":null},"state":"dead","log":["Elapsed: -","","An HTTP error occurred when trying to retrieve this URL.","HTTP errors are often intermittent, and a simple retry will get you on your way.","'https://conda.anaconda.org/conda-forge/linux-64'","","","22/10/07 13:35:00 ERROR b\"Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you.\\nCollecting package metadata (repodata.json): ...working... failed\\n\"","22/10/07 13:35:00 INFO Cleanup following folders and files from staging directory:","22/10/07 13:35:04 INFO Staging directory cleaned up successfully"],"registeredSources":null}  
    

    This is incredibly frustrating, especially given that it takes about 10 minutes for the spark pool to respond each time you try!

    Cheers,

    Matty

    0 comments No comments

  2. PRADEEPCHEEKATLA 91,656 Reputation points Moderator
    2022-10-10T09:08:01.073+00:00

    Hello @Matty-2070 ,

    Thanks for the question and using MS Q&A platform.

    Could you please share the content of the requirements.txt?

    As per the error message: "ERROR: numpy-1.23.3-cp38-cp38-win32.whl is not a supported wheel on this platform. - which clearly says .whl is not supported.

    As per the repro - I'm able to successfully install numpy pacakge using requirements.txt as shown below:

    248880-image.png

    Above requirements.txt successfully installed on the Apache Spark pool:

    248927-image.png

    Checkout the numpy package update from the previous version as shown below:

    248928-image.png

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is jhow you can be part of Q&A Volunteer Moderators

  3. Matty-2070 11 Reputation points
    2022-10-11T08:33:25.623+00:00

    Hi,

    I have tried loading to a completely new spark pool this morning, but it failed again. Here's the error:

    Error details  
    Notifications  
    ProxyLivyApiAsyncError  
    LibraryManagement - Spark Job for sparkpooltest in workspace **** in subscription **** failed with status:  
    {"id":18,"appId":"application_****","appInfo":{"driverLogUrl":"http://vm-****/node/containerlogs/container_****/trusted-service-user","sparkUiUrl":"http://vm-****/proxy/application_****/","isSessionTimedOut":null,"isStreamingQueryExists":"false","impulseErrorCode":null,"impulseTsg":null,"impulseClassification":null},"state":"dead","log":["Elapsed: -","","An HTTP error occurred when trying to retrieve this URL.","HTTP errors are often intermittent, and a simple retry will get you on your way.","'https://conda.anaconda.org/conda-forge/linux-64'","","","22/10/11 08:16:58 ERROR b\"Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you.\\nCollecting package metadata (repodata.json): ...working... failed\\n\"","22/10/11 08:16:58 INFO Cleanup following folders and files from staging directory:","22/10/11 08:17:01 INFO Staging directory cleaned up successfully"],"registeredSources":null}  
    

    I am now wondering whether the issue is linked to how Azure has been configured within our corporate environment given that things are working fine when I use my personal Azure account, but I wouldn't know what to check. Any ideas?

    Cheers,

    Matty


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.