Why do I keep getting an error when trying to use different packages in a session using a .yml file?

Question

I am trying to run sentiment analysis with flair through an Apache Spark session using a Spark Pool in Azure Synapse Analytics. I need to have several packages that are not pre-installed in synapse, so I am using a .yml file to upload the packages I need to the notebook session I am using. The .yml file is a clone of the anaconda environment that I am using offline which imports all of the packages I need perfectly. Here is the .yml file as it exists:

name: ClusteringNotes
channels:

defaults
dependencies:
argon2-cffi=20.1.0=py38h2bbff1b_1
async_generator=1.10=pyhd3eb1b0_0
attrs=21.2.0=pyhd3eb1b0_0
backcall=0.2.0=pyhd3eb1b0_0
bleach=4.0.0=pyhd3eb1b0_0
ca-certificates=2021.7.5=haa95532_1
certifi=2021.5.30=py38haa95532_0
cffi=1.14.6=py38h2bbff1b_0
click=8.0.1=pyhd3eb1b0_0
colorama=0.4.4=pyhd3eb1b0_0
debugpy=1.4.1=py38hd77b12b_0
decorator=5.1.0=pyhd3eb1b0_0
defusedxml=0.7.1=pyhd3eb1b0_0
entrypoints=0.3=py38_0
importlib_metadata=4.8.1=hd3eb1b0_0
ipykernel=6.2.0=py38haa95532_1
ipython=7.27.0=py38hd4e2768_0
ipython_genutils=0.2.0=pyhd3eb1b0_1
jedi=0.18.0=py38haa95532_1
jinja2=3.0.1=pyhd3eb1b0_0
joblib=1.0.1=pyhd3eb1b0_0
jsonschema=3.2.0=pyhd3eb1b0_2
jupyter_client=7.0.1=pyhd3eb1b0_0
jupyter_core=4.7.1=py38haa95532_0
jupyterlab_pygments=0.1.2=py_0
m2w64-gcc-libgfortran=5.3.0=6
m2w64-gcc-libs=5.3.0=7
m2w64-gcc-libs-core=5.3.0=7
m2w64-gmp=6.1.0=2
m2w64-libwinpthread-git=5.0.0.4634.697f757=2
markupsafe=2.0.1=py38h2bbff1b_0
matplotlib-inline=0.1.2=pyhd3eb1b0_2
mistune=0.8.4=py38he774522_1000
msys2-conda-epoch=20160418=1
nbclient=0.5.3=pyhd3eb1b0_0
nbconvert=6.1.0=py38haa95532_0
nbformat=5.1.3=pyhd3eb1b0_0
nest-asyncio=1.5.1=pyhd3eb1b0_0
nltk=3.6.3=pyhd3eb1b0_0
notebook=6.4.3=py38haa95532_0
openssl=1.1.1l=h2bbff1b_0
packaging=21.0=pyhd3eb1b0_0
pandocfilters=1.4.3=py38haa95532_1
parso=0.8.2=pyhd3eb1b0_0
pickleshare=0.7.5=pyhd3eb1b0_1003
pip=21.0.1=py38haa95532_0
prometheus_client=0.11.0=pyhd3eb1b0_0
prompt-toolkit=3.0.20=pyhd3eb1b0_0
pycparser=2.20=py_2
pygments=2.10.0=pyhd3eb1b0_0
pyparsing=2.4.7=pyhd3eb1b0_0
pyrsistent=0.17.3=py38he774522_0
python=3.8.11=h6244533_1
python-dateutil=2.8.2=pyhd3eb1b0_0
pywin32=228=py38hbaba5e8_1
pywinpty=0.5.7=py38_0
pyzmq=22.2.1=py38hd77b12b_1
send2trash=1.8.0=pyhd3eb1b0_1
setuptools=58.0.4=py38haa95532_0
six=1.16.0=pyhd3eb1b0_0
sqlite=3.36.0=h2bbff1b_0
terminado=0.9.4=py38haa95532_0
testpath=0.5.0=pyhd3eb1b0_0
tornado=6.1=py38h2bbff1b_0
traitlets=5.1.0=pyhd3eb1b0_0
vc=14.2=h21ff451_1
vs2015_runtime=14.27.29016=h5e58377_2
wcwidth=0.2.5=pyhd3eb1b0_0
webencodings=0.5.1=py38_1
wheel=0.37.0=pyhd3eb1b0_1
wincertstore=0.2=py38haa95532_2
winpty=0.4.3=4
zipp=3.5.0=pyhd3eb1b0_0
pip:
cachetools==3.1.1
charset-normalizer==2.0.6
cloudpickle==2.0.0
cycler==0.10.0
cython==0.29.14
fasttext==0.9.2
filelock==3.3.0
future==0.18.2
google-auth==1.7.0
hdbscan==0.8.27
huggingface-hub==0.0.19
idna==3.2
importlib-metadata==3.10.1
janome==0.4.1
kiwisolver==1.3.2
lxml==4.6.3
matplotlib==3.4.3
networkx==2.6.3
numpy==1.21.2
pandas==1.3.3
patsy==0.5.2
pillow==8.3.2
plotly==5.3.1
plotly-express==0.4.1
protobuf==3.18.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybind11==2.8.0
pysocks==1.7.1
pytz==2021.3
pyyaml==5.4.1
regex==2021.9.30
requests==2.26.0
rsa==4.0
sacremoses==0.0.46
scikit-learn==1.0
scipy==1.7.1
sentencepiece==0.1.95
sister==0.1.10
smart-open==5.2.1
statsmodels==0.13.0
tenacity==8.0.1
threadpoolctl==3.0.0
tokenizers==0.10.3
torch==1.9.1
tqdm==4.62.3
transformers==4.11.2
umap==0.1.1
urllib3==1.26.7
prefix: C:\Users\v-sacannon\Anaconda3\envs\ClusteringNotes

I followed the steps outlined in the documentation on how to upload a .yml file to a session and then clicked 'apply'. However, when I try to initiate the Spark session, I keep getting this error message:

LIBRARY_MANAGEMENT_FAILED: Livy session has failed. Session state: Error. Error code: LIBRARY_MANAGEMENT_FAILED. [plugins.ftml-synapse.NotesSparkPool.1b413213-b39b-41a4-bd31-3467dff21d9a WorkspaceType: CCID:<>] MaxClusterCreationAttempts=[3] Attempt=[0] ComputeNodeSize=[Medium] ClusterId=[ef827175-db51-46b1-8e18-0db706b9f1ae] AdlaResourceId=[] [Creation] -> [Cleanup]. The cluster creation has failed more than the [3] of times. IsTimeout=[False] IsTerminal=[True] IsRetryable=[False] ErrorType=[UserError] ErrorMessage=[LIBRARY_MANAGEMENT_FAILED] Source: User.

I would love to understand how I can quickly and efficiently upload env files to sessions so that I can actually complete my work. any help is GREATLY appreciated.

Accepted Answer

Hey! sorry for the late reply, I was able to solve this issue by essentially cutting down the .yml file to only the packages that I wanted installed, which begs the question of why we are suggesting uploading full .yml clones of anaconda envs instead of uploading a simple reqiurements.txt file like you are adding packages to a spark pool (not knocking it just curious), here is a picture of what I uploaded as a .yml file instead of the above .yml file I initially tried to upload:

Why do I keep getting an error when trying to use different packages in a session using a .yml file?

0 additional answers