Workspace packages can be custom or private wheel (Python), jar (Scala/Java), or tar.gz (R) files. You can upload these packages to your workspace and later assign them to a specific Spark pool.
To add workspace packages:
- Navigate to the Manage > Workspace packages tab.
- Upload your wheel files by using the file selector.
- Once the files have been uploaded to the Azure Synapse workspace, you can add these packages to a given Apache Spark pool.
Within Azure Synapse, an Apache Spark pool can leverage custom libraries that are either uploaded as Workspace Packages or uploaded within a well-known Azure Data Lake Storage path. However, both of these options cannot be used simultaneously within the same Apache Spark pool. If packages are provided using both methods, only the wheel files specified in the Workspace packages list will be installed.
Once Workspace Packages are used to install packages on a given Apache Spark pool, there is a limitation that you can no longer specify packages using the Storage account path on the same pool.
It's recommended that you don't have multiple wheel packages with the same name in a workspace. If you want to use a different version of the same wheel package, you have to delete the existing version and upload the new one.
Custom-built wheel packages can be installed on the Apache Spark pool by uploading all the wheel files into the Azure Data Lake Storage (Gen2) account that is linked with the Synapse workspace.
The files should be uploaded to the following path in the storage account's default container:
- In some cases, you may need to create the file path based on the structure above if it does not already exist. For example, you may need to add the
pythonfolder within the
librariesfolder if it does not already exist.
- This method of managing custom wheel files will not be supported on the Azure Synapse Runtime for Apache Spark 3.0. Please refer to the Workspace packages feature to manage custom wheel files.
To install custom libraries using the Azure DataLake Storage method, you must have the Storage Blob Data Contributor or Storage Blob Data Owner permissions on the primary Gen2 Storage account that is linked to the Azure Synapse Analytics workspace.
- View the default libraries: Apache Spark version support
- Troubleshoot library installation errors: Troubleshoot library errors
- Create a private Conda channel using your Azure Data Lake Storage Account: Conda private channels
Submit and view feedback for