Why do I keep getting an error when trying to use different packages in a session using a .yml file?

2021-10-12T20:12:10.567+00:00

I am trying to run sentiment analysis with flair through an Apache Spark session using a Spark Pool in Azure Synapse Analytics. I need to have several packages that are not pre-installed in synapse, so I am using a .yml file to upload the packages I need to the notebook session I am using. The .yml file is a clone of the anaconda environment that I am using offline which imports all of the packages I need perfectly. Here is the .yml file as it exists:

name: ClusteringNotes
channels:

  • defaults
    dependencies:
  • argon2-cffi=20.1.0=py38h2bbff1b_1
  • async_generator=1.10=pyhd3eb1b0_0
  • attrs=21.2.0=pyhd3eb1b0_0
  • backcall=0.2.0=pyhd3eb1b0_0
  • bleach=4.0.0=pyhd3eb1b0_0
  • ca-certificates=2021.7.5=haa95532_1
  • certifi=2021.5.30=py38haa95532_0
  • cffi=1.14.6=py38h2bbff1b_0
  • click=8.0.1=pyhd3eb1b0_0
  • colorama=0.4.4=pyhd3eb1b0_0
  • debugpy=1.4.1=py38hd77b12b_0
  • decorator=5.1.0=pyhd3eb1b0_0
  • defusedxml=0.7.1=pyhd3eb1b0_0
  • entrypoints=0.3=py38_0
  • importlib_metadata=4.8.1=hd3eb1b0_0
  • ipykernel=6.2.0=py38haa95532_1
  • ipython=7.27.0=py38hd4e2768_0
  • ipython_genutils=0.2.0=pyhd3eb1b0_1
  • jedi=0.18.0=py38haa95532_1
  • jinja2=3.0.1=pyhd3eb1b0_0
  • joblib=1.0.1=pyhd3eb1b0_0
  • jsonschema=3.2.0=pyhd3eb1b0_2
  • jupyter_client=7.0.1=pyhd3eb1b0_0
  • jupyter_core=4.7.1=py38haa95532_0
  • jupyterlab_pygments=0.1.2=py_0
  • m2w64-gcc-libgfortran=5.3.0=6
  • m2w64-gcc-libs=5.3.0=7
  • m2w64-gcc-libs-core=5.3.0=7
  • m2w64-gmp=6.1.0=2
  • m2w64-libwinpthread-git=5.0.0.4634.697f757=2
  • markupsafe=2.0.1=py38h2bbff1b_0
  • matplotlib-inline=0.1.2=pyhd3eb1b0_2
  • mistune=0.8.4=py38he774522_1000
  • msys2-conda-epoch=20160418=1
  • nbclient=0.5.3=pyhd3eb1b0_0
  • nbconvert=6.1.0=py38haa95532_0
  • nbformat=5.1.3=pyhd3eb1b0_0
  • nest-asyncio=1.5.1=pyhd3eb1b0_0
  • nltk=3.6.3=pyhd3eb1b0_0
  • notebook=6.4.3=py38haa95532_0
  • openssl=1.1.1l=h2bbff1b_0
  • packaging=21.0=pyhd3eb1b0_0
  • pandocfilters=1.4.3=py38haa95532_1
  • parso=0.8.2=pyhd3eb1b0_0
  • pickleshare=0.7.5=pyhd3eb1b0_1003
  • pip=21.0.1=py38haa95532_0
  • prometheus_client=0.11.0=pyhd3eb1b0_0
  • prompt-toolkit=3.0.20=pyhd3eb1b0_0
  • pycparser=2.20=py_2
  • pygments=2.10.0=pyhd3eb1b0_0
  • pyparsing=2.4.7=pyhd3eb1b0_0
  • pyrsistent=0.17.3=py38he774522_0
  • python=3.8.11=h6244533_1
  • python-dateutil=2.8.2=pyhd3eb1b0_0
  • pywin32=228=py38hbaba5e8_1
  • pywinpty=0.5.7=py38_0
  • pyzmq=22.2.1=py38hd77b12b_1
  • send2trash=1.8.0=pyhd3eb1b0_1
  • setuptools=58.0.4=py38haa95532_0
  • six=1.16.0=pyhd3eb1b0_0
  • sqlite=3.36.0=h2bbff1b_0
  • terminado=0.9.4=py38haa95532_0
  • testpath=0.5.0=pyhd3eb1b0_0
  • tornado=6.1=py38h2bbff1b_0
  • traitlets=5.1.0=pyhd3eb1b0_0
  • vc=14.2=h21ff451_1
  • vs2015_runtime=14.27.29016=h5e58377_2
  • wcwidth=0.2.5=pyhd3eb1b0_0
  • webencodings=0.5.1=py38_1
  • wheel=0.37.0=pyhd3eb1b0_1
  • wincertstore=0.2=py38haa95532_2
  • winpty=0.4.3=4
  • zipp=3.5.0=pyhd3eb1b0_0
  • pip:
  • cachetools==3.1.1
  • charset-normalizer==2.0.6
  • cloudpickle==2.0.0
  • cycler==0.10.0
  • cython==0.29.14
  • fasttext==0.9.2
  • filelock==3.3.0
  • future==0.18.2
  • google-auth==1.7.0
  • hdbscan==0.8.27
  • huggingface-hub==0.0.19
  • idna==3.2
  • importlib-metadata==3.10.1
  • janome==0.4.1
  • kiwisolver==1.3.2
  • lxml==4.6.3
  • matplotlib==3.4.3
  • networkx==2.6.3
  • numpy==1.21.2
  • pandas==1.3.3
  • patsy==0.5.2
  • pillow==8.3.2
  • plotly==5.3.1
  • plotly-express==0.4.1
  • protobuf==3.18.1
  • pyasn1==0.4.8
  • pyasn1-modules==0.2.8
  • pybind11==2.8.0
  • pysocks==1.7.1
  • pytz==2021.3
  • pyyaml==5.4.1
  • regex==2021.9.30
  • requests==2.26.0
  • rsa==4.0
  • sacremoses==0.0.46
  • scikit-learn==1.0
  • scipy==1.7.1
  • sentencepiece==0.1.95
  • sister==0.1.10
  • smart-open==5.2.1
  • statsmodels==0.13.0
  • tenacity==8.0.1
  • threadpoolctl==3.0.0
  • tokenizers==0.10.3
  • torch==1.9.1
  • tqdm==4.62.3
  • transformers==4.11.2
  • umap==0.1.1
  • urllib3==1.26.7
    prefix: C:\Users\v-sacannon\Anaconda3\envs\ClusteringNotes

I followed the steps outlined in the documentation on how to upload a .yml file to a session and then clicked 'apply'. However, when I try to initiate the Spark session, I keep getting this error message:

LIBRARY_MANAGEMENT_FAILED: Livy session has failed. Session state: Error. Error code: LIBRARY_MANAGEMENT_FAILED. [plugins.ftml-synapse.NotesSparkPool.1b413213-b39b-41a4-bd31-3467dff21d9a WorkspaceType:<Synapse> CCID:<>] MaxClusterCreationAttempts=[3] Attempt=[0] ComputeNodeSize=[Medium] ClusterId=[ef827175-db51-46b1-8e18-0db706b9f1ae] AdlaResourceId=[] [Creation] -> [Cleanup]. The cluster creation has failed more than the [3] of times. IsTimeout=[False] IsTerminal=[True] IsRetryable=[False] ErrorType=[UserError] ErrorMessage=[LIBRARY_MANAGEMENT_FAILED] Source: User.

I would love to understand how I can quickly and efficiently upload env files to sessions so that I can actually complete my work. any help is GREATLY appreciated.

.NET
.NET
Microsoft Technologies based on the .NET software framework.
3,363 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,358 questions
{count} votes

Accepted answer
  1. 2021-10-21T20:36:10.987+00:00

    Hey! sorry for the late reply, I was able to solve this issue by essentially cutting down the .yml file to only the packages that I wanted installed, which begs the question of why we are suggesting uploading full .yml clones of anaconda envs instead of uploading a simple reqiurements.txt file like you are adding packages to a spark pool (not knocking it just curious), here is a picture of what I uploaded as a .yml file instead of the above .yml file I initially tried to upload:

    142631-image.png


0 additional answers

Sort by: Most helpful