Workspace libraries
Workspace libraries serve as a local repository from which you create cluster-installed libraries. A workspace library might be custom code created by your organization, or might be a particular version of an open-source library that your organization has standardized on.
You must install a workspace library on a cluster before it can be used in a notebook or job.
Workspace libraries in the Shared folder are available to all users in a workspace, while workspace libraries in a user folder are available only to that user.
Create a workspace library
Right-click the workspace folder where you want to store the library.
Select Create > Library.
The Create Library dialog appears.
Select the Library Source and follow the appropriate procedure:
Upload a Jar, Python egg, or Python wheel
Note
Installing Python eggs is deprecated and will be removed in a future Databricks Runtime release.
- In the Library Source button list, select Upload.
- Select Jar, Python Egg, or Python Whl.
- Optionally enter a library name.
- Drag your Jar, Egg, or Whl to the drop box or click the drop box and navigate to a file. The file is uploaded to
dbfs:/FileStore/jars
. - Click Create. The library status screen displays.
- Optionally install the library on a cluster.
Reference an uploaded jar, Python egg, or Python wheel
If you’ve already uploaded a jar, egg, or wheel to object storage you can reference it in a workspace library.
You can choose a library in DBFS or one stored in ADLS. ADLS is only supported through the encrypted abfss://
path.
- Select DBFS/ADLS in the Library Source button list.
- Select Jar, Python Egg, or Python Whl.
- Optionally enter a library name.
- Specify the DBFS or ADLS path to the library.
- Click Create. The library status screen displays.
- Optionally install the library on a cluster.
PyPI package
In the Library Source button list, select PyPI.
Enter a PyPI package name. To install a specific version of a library, use this format for the library:
<library>==<version>
. For example,scikit-learn==0.19.1
.Note
For jobs, Databricks recommends that you specify a library version to ensure a reproducible environment. If the library version is not fully specified, Databricks uses the latest matching version. This means that different runs of the same job might use different library versions as new versions are published. Specifying the library version prevents new, breaking changes in libraries from breaking your jobs.
In the Index URL field, optionally enter a PyPI index URL.
Click Create. The library status screen displays.
Optionally install the library on a cluster.
Maven or Spark package
In the Library Source button list, select Maven.
Specify a Maven coordinate. Do one of the following:
- In the Coordinate field, enter the Maven coordinate of the library to install. Maven coordinates are in the form
groupId:artifactId:version
; for example,com.databricks:spark-avro_2.10:1.0.0
. - If you don’t know the exact coordinate, enter the library name and click Search Packages. A list of matching packages displays. To display details about a package, click its name. You can sort packages by name, organization, and rating. You can also filter the results by writing a query in the search bar. The results refresh automatically.
- Select Maven Central or Spark Packages in the drop-down list at the top left.
- Optionally select the package version in the Releases column.
- Click + Select next to a package. The Coordinate field is filled in with the selected package and version.
- In the Coordinate field, enter the Maven coordinate of the library to install. Maven coordinates are in the form
In the Repository field, optionally enter a Maven repository URL.
Note
Internal Maven repositories are not supported.
In the Exclusions field, optionally provide the
groupId
and theartifactId
of the dependencies that you want to exclude; for example,log4j:log4j
.Click Create. The library status screen displays.
Optionally install the library on a cluster.
CRAN package
- In the Library Source button list, select CRAN.
- In the Package field, enter the name of the package.
- In the Repository field, optionally enter the CRAN repository URL.
- Click Create. The library detail screen displays.
- Optionally install the library on a cluster.
Note
CRAN mirrors serve the latest version of a library. As a result, you may end up with different versions of an R package if you attach the library to different clusters at different times. To learn how to manage and fix R package versions on Databricks, see the Knowledge Base.
View workspace library details
- Go to the workspace folder containing the library.
- Click the library name.
The library details page shows the running clusters and the install status of the library. If the library is installed, the page contains a link to the package host. If the library was uploaded, the page displays a link to the uploaded package file.
Move a workspace library
- Go to the workspace folder containing the library.
- Click the drop-down arrow
to the right of the library name and select Move. A folder browser displays.
- Click the destination folder.
- Click Select.
- Click Confirm and Move.
Delete a workspace library
Important
Before deleting a workspace library, you should uninstall it from all clusters.
To delete a workspace library:
- Move the library to the Trash folder.
- Either permanently delete the library in the Trash folder or empty the Trash folder.
Feedback
Submit and view feedback for