Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Lakeflow Spark Declarative Pipelines supports external dependencies in your pipelines. Databricks recommends using one of two patterns to install Python packages:
- Use the Environment settings to add packages to the pipeline environment for all source files in a pipeline.
- Import modules or libraries from source code stored in workspace files. See Import Python modules from Git folders or workspace files.
Pipelines also support using cluster-scoped init scripts. However, these external dependencies, particularly init scripts, increase the risk of issues with runtime upgrades. To mitigate these risks, minimize using init scripts in your pipelines. If your processing requires init scripts, automate testing of your pipeline to detect problems early. If you use init scripts, Databricks recommends increasing your testing frequency.
Important
Because JVM libraries are not supported in pipelines, do not use an init script to install JVM libraries. However, You can install other library types, such as Python libraries, with an init script.
Python libraries
To specify external Python libraries, edit the environment for your pipeline.
- From the pipeline editor, click Settings.
- Under Pipeline environment, select
Edit environment.
- Click
Add dependency.
- Type the name of the dependency. Databricks recommends pinning the version of the library. For example, to add a dependency on
simplejsonversion 3.19, typesimplejson==3.19.*.
You can also install a Python wheel package from a Unity Catalog volume, by specifying its path, such as /Volumes/my_catalog/my_schema/my_ldp_volume/ldpfns-1.0-py3-none-any.whl.
Environment version
By default, the Python language version and preinstalled library set available to your pipeline come from the current Databricks Runtime channel version. See Lakeflow Spark Declarative Pipelines release notes and the release upgrade process for the current versions and the per-runtime package lists.
Important
This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.
To pin the Python language version and preinstalled library set independently of Databricks Runtime upgrades, configure an environment version on the pipeline. While an environment version is set, Databricks Runtime upgrades don't change your Python language version or preinstalled library versions, and any external dependencies you add through the Environment settings are layered on top of this base. See Configure environment versions for pipelines.
Can I use Scala or Java libraries in pipelines?
No, pipelines support only SQL and Python. You cannot use JVM libraries in a pipeline. Installing JVM libraries causes unpredictable behavior, and may break with future Lakeflow Spark Declarative Pipelines releases. If your pipeline uses an init script, you must also ensure that JVM libraries are not installed by the script.