Install wheel files for azure spark pool from azure storage with specific order

Banaszczyk, Kamil 22 Reputation points
2022-02-05T18:32:39.523+00:00

I'm trying to use geopandas in azure spark pools but i'm receiving error LIBRARY_MANAGEMENT_FAILED. I'm installing packages via storage as described here: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-python-packages#storage-account. Probably the issue is because geopandas requires some dependencies and they need to be installed before geopandas. Even if sparkpools install libraries by name, one of dependency is Shapely, so it will be loaded last. Is there any possibility to manage order of installing dependencies from azure storage?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2022-02-16T12:49:14.913+00:00

    Hi @Anonymous ,

    I got update internally. Kindly check pointers that helps for your case.

    1 For your update failure. Can you check if there was any spark job triggered and if so, can you check the output generated? This will help you identify the root cause of the issue and confirm that in fact the order of the packages is breaking the update.

    2 Updating libraries through the storage account is a legacy feature and no plan to add any more changes there. Is there any reason you are using this method? In fact, this process is not supported in the latest versions of Spark 3+.

    3 geopandas is available in Anaconda so, the best way to install it is though a YML file and let conda resolve all the dependencies. You can also use directly the whl file, but I believe the most convenient way is YML. You can find all about the latest Library Management process here: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-python-packages#install-wheel-files

    Hope this will help. Thank you.

    ----------

    Please consider hitting Accept Answer. Accepted answers helps community as well.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.