question

SamuelCannonAllegisGroupHoldingsIn-6271 avatar image
0 Votes"
SamuelCannonAllegisGroupHoldingsIn-6271 asked SamuelCannonAllegisGroupHoldingsIn-6271 commented

Why do I keep getting an error when trying to use different packages in a session using a .yml file?

I am trying to run sentiment analysis with flair through an Apache Spark session using a Spark Pool in Azure Synapse Analytics. I need to have several packages that are not pre-installed in synapse, so I am using a .yml file to upload the packages I need to the notebook session I am using. The .yml file is a clone of the anaconda environment that I am using offline which imports all of the packages I need perfectly. Here is the .yml file as it exists:

name: ClusteringNotes
channels:
- defaults
dependencies:
- argon2-cffi=20.1.0=py38h2bbff1b_1
- async_generator=1.10=pyhd3eb1b0_0
- attrs=21.2.0=pyhd3eb1b0_0
- backcall=0.2.0=pyhd3eb1b0_0
- bleach=4.0.0=pyhd3eb1b0_0
- ca-certificates=2021.7.5=haa95532_1
- certifi=2021.5.30=py38haa95532_0
- cffi=1.14.6=py38h2bbff1b_0
- click=8.0.1=pyhd3eb1b0_0
- colorama=0.4.4=pyhd3eb1b0_0
- debugpy=1.4.1=py38hd77b12b_0
- decorator=5.1.0=pyhd3eb1b0_0
- defusedxml=0.7.1=pyhd3eb1b0_0
- entrypoints=0.3=py38_0
- importlib_metadata=4.8.1=hd3eb1b0_0
- ipykernel=6.2.0=py38haa95532_1
- ipython=7.27.0=py38hd4e2768_0
- ipython_genutils=0.2.0=pyhd3eb1b0_1
- jedi=0.18.0=py38haa95532_1
- jinja2=3.0.1=pyhd3eb1b0_0
- joblib=1.0.1=pyhd3eb1b0_0
- jsonschema=3.2.0=pyhd3eb1b0_2
- jupyter_client=7.0.1=pyhd3eb1b0_0
- jupyter_core=4.7.1=py38haa95532_0
- jupyterlab_pygments=0.1.2=py_0
- m2w64-gcc-libgfortran=5.3.0=6
- m2w64-gcc-libs=5.3.0=7
- m2w64-gcc-libs-core=5.3.0=7
- m2w64-gmp=6.1.0=2
- m2w64-libwinpthread-git=5.0.0.4634.697f757=2
- markupsafe=2.0.1=py38h2bbff1b_0
- matplotlib-inline=0.1.2=pyhd3eb1b0_2
- mistune=0.8.4=py38he774522_1000
- msys2-conda-epoch=20160418=1
- nbclient=0.5.3=pyhd3eb1b0_0
- nbconvert=6.1.0=py38haa95532_0
- nbformat=5.1.3=pyhd3eb1b0_0
- nest-asyncio=1.5.1=pyhd3eb1b0_0
- nltk=3.6.3=pyhd3eb1b0_0
- notebook=6.4.3=py38haa95532_0
- openssl=1.1.1l=h2bbff1b_0
- packaging=21.0=pyhd3eb1b0_0
- pandocfilters=1.4.3=py38haa95532_1
- parso=0.8.2=pyhd3eb1b0_0
- pickleshare=0.7.5=pyhd3eb1b0_1003
- pip=21.0.1=py38haa95532_0
- prometheus_client=0.11.0=pyhd3eb1b0_0
- prompt-toolkit=3.0.20=pyhd3eb1b0_0
- pycparser=2.20=py_2
- pygments=2.10.0=pyhd3eb1b0_0
- pyparsing=2.4.7=pyhd3eb1b0_0
- pyrsistent=0.17.3=py38he774522_0
- python=3.8.11=h6244533_1
- python-dateutil=2.8.2=pyhd3eb1b0_0
- pywin32=228=py38hbaba5e8_1
- pywinpty=0.5.7=py38_0
- pyzmq=22.2.1=py38hd77b12b_1
- send2trash=1.8.0=pyhd3eb1b0_1
- setuptools=58.0.4=py38haa95532_0
- six=1.16.0=pyhd3eb1b0_0
- sqlite=3.36.0=h2bbff1b_0
- terminado=0.9.4=py38haa95532_0
- testpath=0.5.0=pyhd3eb1b0_0
- tornado=6.1=py38h2bbff1b_0
- traitlets=5.1.0=pyhd3eb1b0_0
- vc=14.2=h21ff451_1
- vs2015_runtime=14.27.29016=h5e58377_2
- wcwidth=0.2.5=pyhd3eb1b0_0
- webencodings=0.5.1=py38_1
- wheel=0.37.0=pyhd3eb1b0_1
- wincertstore=0.2=py38haa95532_2
- winpty=0.4.3=4
- zipp=3.5.0=pyhd3eb1b0_0
- pip:
- cachetools==3.1.1
- charset-normalizer==2.0.6
- cloudpickle==2.0.0
- cycler==0.10.0
- cython==0.29.14
- fasttext==0.9.2
- filelock==3.3.0
- future==0.18.2
- google-auth==1.7.0
- hdbscan==0.8.27
- huggingface-hub==0.0.19
- idna==3.2
- importlib-metadata==3.10.1
- janome==0.4.1
- kiwisolver==1.3.2
- lxml==4.6.3
- matplotlib==3.4.3
- networkx==2.6.3
- numpy==1.21.2
- pandas==1.3.3
- patsy==0.5.2
- pillow==8.3.2
- plotly==5.3.1
- plotly-express==0.4.1
- protobuf==3.18.1
- pyasn1==0.4.8
- pyasn1-modules==0.2.8
- pybind11==2.8.0
- pysocks==1.7.1
- pytz==2021.3
- pyyaml==5.4.1
- regex==2021.9.30
- requests==2.26.0
- rsa==4.0
- sacremoses==0.0.46
- scikit-learn==1.0
- scipy==1.7.1
- sentencepiece==0.1.95
- sister==0.1.10
- smart-open==5.2.1
- statsmodels==0.13.0
- tenacity==8.0.1
- threadpoolctl==3.0.0
- tokenizers==0.10.3
- torch==1.9.1
- tqdm==4.62.3
- transformers==4.11.2
- umap==0.1.1
- urllib3==1.26.7
prefix: C:\Users\v-sacannon\Anaconda3\envs\ClusteringNotes

I followed the steps outlined in the documentation on how to upload a .yml file to a session and then clicked 'apply'. However, when I try to initiate the Spark session, I keep getting this error message:

LIBRARY_MANAGEMENT_FAILED: Livy session has failed. Session state: Error. Error code: LIBRARY_MANAGEMENT_FAILED. [plugins.ftml-synapse.NotesSparkPool.1b413213-b39b-41a4-bd31-3467dff21d9a WorkspaceType:<Synapse> CCID:<>] MaxClusterCreationAttempts=[3] Attempt=[0] ComputeNodeSize=[Medium] ClusterId=[ef827175-db51-46b1-8e18-0db706b9f1ae] AdlaResourceId=[] [Creation] -> [Cleanup]. The cluster creation has failed more than the [3] of times. IsTimeout=[False] IsTerminal=[True] IsRetryable=[False] ErrorType=[UserError] ErrorMessage=[LIBRARY_MANAGEMENT_FAILED] Source: User.

I would love to understand how I can quickly and efficiently upload env files to sessions so that I can actually complete my work. any help is GREATLY appreciated.

azure-synapse-analyticsdotnet-ml-big-data
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @SamuelCannonAllegisGroupHoldingsIn-6271,

Welcome to the Microsoft Q&A platform.

Could you please share the logs for the Apache Spark application which used to install the library?

140097-image.png


0 Votes 0 ·
image.png (166.8 KiB)

Hello @SamuelCannonAllegisGroupHoldingsIn-6271,

Just checking in if you have had a chance to see the previous response. We need the following information to understand/investigate this issue further.

1 Vote 1 ·

1 Answer

SamuelCannonAllegisGroupHoldingsIn-6271 avatar image
0 Votes"
SamuelCannonAllegisGroupHoldingsIn-6271 answered SamuelCannonAllegisGroupHoldingsIn-6271 commented

Hey! sorry for the late reply, I was able to solve this issue by essentially cutting down the .yml file to only the packages that I wanted installed, which begs the question of why we are suggesting uploading full .yml clones of anaconda envs instead of uploading a simple reqiurements.txt file like you are adding packages to a spark pool (not knocking it just curious), here is a picture of what I uploaded as a .yml file instead of the above .yml file I initially tried to upload:

142631-image.png



image.png (45.9 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

After I uploaded the .yml file this way it had no problems, but I essentially had to write a requirements.txt file in the .yml file format, I am delicately adding packages when I need to in order to see how many packages it will accept, but I feel like the process to upload these files on a session level could be improved to accept requirements.txt files instead, I also fear that if I do need to upload a fully cloned env from anaconda that I won't be able to do so...thank you so much for your response btw

1 Vote 1 ·
PRADEEPCHEEKATLA-MSFT avatar image PRADEEPCHEEKATLA-MSFT SamuelCannonAllegisGroupHoldingsIn-6271 ·

Hello @SamuelCannonAllegisGroupHoldingsIn-6271,

This is excepted behaviour when you have long list of packages.

Glad to know that your issue has resolved. You can accept it as answer(142796-image.png). This can be beneficial to other community members. Thank you.


0 Votes 0 ·
image.png (2.9 KiB)

Pradeep,

so now that I have been able to upload the packages to the session level, the notebook runs, but once I upload the EXACT SAME .yml file to the spark pool, to add the packages to the pool itself, the notebook throws an error and tells me that I have the wrong packages installed. Why a I getting an error when uploading the same .yml file to the spark pool while I am not getting an error when it is uploaded at the session level?

0 Votes 0 ·