June 2019
These features and Azure Databricks platform improvements were released in June 2019.
Note
Releases are staged. Your Azure Databricks account may not be updated until up to a week after the initial release date.
Lsv2 instance support is generally available
June 24 - 26, 2019: Version 2.100
Azure Databricks now provides full support for the Lsv2 VM series for high-throughput and high-IOPS workloads.
RStudio integration no longer limited to high concurrency clusters
June 6 - 11, 2019: Version 2.99
Now you can enable RStudio Server on standard clusters in Azure Databricks, in addition to the high-concurrency clusters that were already supported. Regardless of cluster mode, RStudio Server integration continues to require that you disable the automatic termination option for your cluster. See RStudio on Azure Databricks.
MLflow 1.0
June 3, 2019
MLflow is an open source platform to manage the complete machine learning lifecycle. With MLflow, data scientists can track and share experiments locally or in the cloud, package and share models across frameworks, and deploy models virtually anywhere.
We are excited to announce the release of MLflow 1.0 today. The 1.0 release not only marks the maturity and stability of the APIs, but also adds a number of frequently requested features and improvements:
- The CLI was reorganized and now has dedicated commands for artifacts, models, db (the tracking database), and server (the tracking server).
- Tracking server search supports a simplified version of the
SQL WHERE
clause. In addition to supporting run metrics and params, search has been enhanced to support some run attributes and user and system tags. - Adds support for x coordinates in the Tracking API. The MLflow UI visualization components now also supports plotting metrics against provided x-coordinate values.
- Adds a
runs/log-batch
REST API endpoint as well as Python, R, and Java methods for logging multiple metrics, parameters, and tags with a single API request. - For tracking, the MLflow 1.0 client is now supported on Windows.
- Adds support for HDFS as an artifact store backend.
- Adds a command to build a Docker container whose default entry point serves the specified MLflow Python function model at port 8080 within the container.
- Adds an experimental ONNX model flavor.
You can view the full list of changes in the MLflow Change log.
Databricks Runtime 5.4 for Machine Learning
June 3, 2019
Databricks Runtime 5.4 ML is built on top of Databricks Runtime 5.4 (EoS). It contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost, and provides distributed TensorFlow training using Horovod.
It includes the following new features:
- MLlib integration with MLflow (Public Preview).
- Hyperopt with new SparkTrials class pre-installed (Public Preview).
- HorovodRunner output sent from Horovod to the Spark driver node is now visible in notebook cells.
- XGBoost Python package pre-installed.
For details, see Databricks Runtime 5.4 for ML (EoS).
Databricks Runtime 5.4
June 3, 2019
Databricks Runtime 5.4 is now available. Databricks Runtime 5.4 includes Apache Spark 2.4.2, upgraded Python, R, Java, and Scala libraries, and the following new features:
- Delta Lake on Databricks adds Auto Optimize (Public Preview)
- Use your favorite IDE and notebook server with Databricks Connect
- Library utilities generally available
- Binary file data source
For details, see Databricks Runtime 5.4 (EoS).