Why do I keep getting an error when trying to use different packages in a session using a .yml file?


I am trying to run sentiment analysis with flair through an Apache Spark session using a Spark Pool in Azure Synapse Analytics. I need to have several packages that are not pre-installed in synapse, so I am using a .yml file to upload the packages I need to the notebook session I am using. The .yml file is a clone of the anaconda environment that I am using offline which imports all of the packages I need perfectly. Here is the .yml file as it exists:

name: ClusteringNotes

  • defaults
  • argon2-cffi=20.1.0=py38h2bbff1b_1
  • async_generator=1.10=pyhd3eb1b0_0
  • attrs=21.2.0=pyhd3eb1b0_0
  • backcall=0.2.0=pyhd3eb1b0_0
  • bleach=4.0.0=pyhd3eb1b0_0
  • ca-certificates=2021.7.5=haa95532_1
  • certifi=2021.5.30=py38haa95532_0
  • cffi=1.14.6=py38h2bbff1b_0
  • click=8.0.1=pyhd3eb1b0_0
  • colorama=0.4.4=pyhd3eb1b0_0
  • debugpy=1.4.1=py38hd77b12b_0
  • decorator=5.1.0=pyhd3eb1b0_0
  • defusedxml=0.7.1=pyhd3eb1b0_0
  • entrypoints=0.3=py38_0
  • importlib_metadata=4.8.1=hd3eb1b0_0
  • ipykernel=6.2.0=py38haa95532_1
  • ipython=7.27.0=py38hd4e2768_0
  • ipython_genutils=0.2.0=pyhd3eb1b0_1
  • jedi=0.18.0=py38haa95532_1
  • jinja2=3.0.1=pyhd3eb1b0_0
  • joblib=1.0.1=pyhd3eb1b0_0
  • jsonschema=3.2.0=pyhd3eb1b0_2
  • jupyter_client=7.0.1=pyhd3eb1b0_0
  • jupyter_core=4.7.1=py38haa95532_0
  • jupyterlab_pygments=0.1.2=py_0
  • m2w64-gcc-libgfortran=5.3.0=6
  • m2w64-gcc-libs=5.3.0=7
  • m2w64-gcc-libs-core=5.3.0=7
  • m2w64-gmp=6.1.0=2
  • m2w64-libwinpthread-git=
  • markupsafe=2.0.1=py38h2bbff1b_0
  • matplotlib-inline=0.1.2=pyhd3eb1b0_2
  • mistune=0.8.4=py38he774522_1000
  • msys2-conda-epoch=20160418=1
  • nbclient=0.5.3=pyhd3eb1b0_0
  • nbconvert=6.1.0=py38haa95532_0
  • nbformat=5.1.3=pyhd3eb1b0_0
  • nest-asyncio=1.5.1=pyhd3eb1b0_0
  • nltk=3.6.3=pyhd3eb1b0_0
  • notebook=6.4.3=py38haa95532_0
  • openssl=1.1.1l=h2bbff1b_0
  • packaging=21.0=pyhd3eb1b0_0
  • pandocfilters=1.4.3=py38haa95532_1
  • parso=0.8.2=pyhd3eb1b0_0
  • pickleshare=0.7.5=pyhd3eb1b0_1003
  • pip=21.0.1=py38haa95532_0
  • prometheus_client=0.11.0=pyhd3eb1b0_0
  • prompt-toolkit=3.0.20=pyhd3eb1b0_0
  • pycparser=2.20=py_2
  • pygments=2.10.0=pyhd3eb1b0_0
  • pyparsing=2.4.7=pyhd3eb1b0_0
  • pyrsistent=0.17.3=py38he774522_0
  • python=3.8.11=h6244533_1
  • python-dateutil=2.8.2=pyhd3eb1b0_0
  • pywin32=228=py38hbaba5e8_1
  • pywinpty=0.5.7=py38_0
  • pyzmq=22.2.1=py38hd77b12b_1
  • send2trash=1.8.0=pyhd3eb1b0_1
  • setuptools=58.0.4=py38haa95532_0
  • six=1.16.0=pyhd3eb1b0_0
  • sqlite=3.36.0=h2bbff1b_0
  • terminado=0.9.4=py38haa95532_0
  • testpath=0.5.0=pyhd3eb1b0_0
  • tornado=6.1=py38h2bbff1b_0
  • traitlets=5.1.0=pyhd3eb1b0_0
  • vc=14.2=h21ff451_1
  • vs2015_runtime=14.27.29016=h5e58377_2
  • wcwidth=0.2.5=pyhd3eb1b0_0
  • webencodings=0.5.1=py38_1
  • wheel=0.37.0=pyhd3eb1b0_1
  • wincertstore=0.2=py38haa95532_2
  • winpty=0.4.3=4
  • zipp=3.5.0=pyhd3eb1b0_0
  • pip:
  • cachetools==3.1.1
  • charset-normalizer==2.0.6
  • cloudpickle==2.0.0
  • cycler==0.10.0
  • cython==0.29.14
  • fasttext==0.9.2
  • filelock==3.3.0
  • future==0.18.2
  • google-auth==1.7.0
  • hdbscan==0.8.27
  • huggingface-hub==0.0.19
  • idna==3.2
  • importlib-metadata==3.10.1
  • janome==0.4.1
  • kiwisolver==1.3.2
  • lxml==4.6.3
  • matplotlib==3.4.3
  • networkx==2.6.3
  • numpy==1.21.2
  • pandas==1.3.3
  • patsy==0.5.2
  • pillow==8.3.2
  • plotly==5.3.1
  • plotly-express==0.4.1
  • protobuf==3.18.1
  • pyasn1==0.4.8
  • pyasn1-modules==0.2.8
  • pybind11==2.8.0
  • pysocks==1.7.1
  • pytz==2021.3
  • pyyaml==5.4.1
  • regex==2021.9.30
  • requests==2.26.0
  • rsa==4.0
  • sacremoses==0.0.46
  • scikit-learn==1.0
  • scipy==1.7.1
  • sentencepiece==0.1.95
  • sister==0.1.10
  • smart-open==5.2.1
  • statsmodels==0.13.0
  • tenacity==8.0.1
  • threadpoolctl==3.0.0
  • tokenizers==0.10.3
  • torch==1.9.1
  • tqdm==4.62.3
  • transformers==4.11.2
  • umap==0.1.1
  • urllib3==1.26.7
    prefix: C:\Users\v-sacannon\Anaconda3\envs\ClusteringNotes

I followed the steps outlined in the documentation on how to upload a .yml file to a session and then clicked 'apply'. However, when I try to initiate the Spark session, I keep getting this error message:

LIBRARY_MANAGEMENT_FAILED: Livy session has failed. Session state: Error. Error code: LIBRARY_MANAGEMENT_FAILED. [plugins.ftml-synapse.NotesSparkPool.1b413213-b39b-41a4-bd31-3467dff21d9a WorkspaceType:<Synapse> CCID:<>] MaxClusterCreationAttempts=[3] Attempt=[0] ComputeNodeSize=[Medium] ClusterId=[ef827175-db51-46b1-8e18-0db706b9f1ae] AdlaResourceId=[] [Creation] -> [Cleanup]. The cluster creation has failed more than the [3] of times. IsTimeout=[False] IsTerminal=[True] IsRetryable=[False] ErrorType=[UserError] ErrorMessage=[LIBRARY_MANAGEMENT_FAILED] Source: User.

I would love to understand how I can quickly and efficiently upload env files to sessions so that I can actually complete my work. any help is GREATLY appreciated.

Accepted answer
  1. 2021-10-21T20:36:10.987+00:00

    Hey! sorry for the late reply, I was able to solve this issue by essentially cutting down the .yml file to only the packages that I wanted installed, which begs the question of why we are suggesting uploading full .yml clones of anaconda envs instead of uploading a simple reqiurements.txt file like you are adding packages to a spark pool (not knocking it just curious), here is a picture of what I uploaded as a .yml file instead of the above .yml file I initially tried to upload:


