Databricks Runtime 7.0 (EoS)

Artikkeli
09/03/2024

Note

Support for this Databricks Runtime version has ended. For the end-of-support date, see End-of-support history. For all supported Databricks Runtime versions, see Databricks Runtime release notes versions and compatibility.

Databricks released this version in June 2020.

The following release notes provide information about Databricks Runtime 7.0, powered by Apache Spark 3.0.

New features

Databricks Runtime 7.0 includes the following new features:

Scala 2.12

Databricks Runtime 7.0 upgrades Scala from 2.11.12 to 2.12.10. The change list between Scala 2.12 and 2.11 is in the Scala 2.12.0 release notes.
Auto Loader (Public Preview), released in Databricks Runtime 6.4, has been improved in Databricks Runtime 7.0

Auto Loader gives you a more efficient way to process new data files incrementally as they arrive on a cloud blob store during ETL. This is an improvement over file-based structured streaming, which identifies new files by repeatedly listing the cloud directory and tracking the files that have been seen, and can be very inefficient as the directory grows. Auto Loader is also more convenient and effective than file-notification-based structured streaming, which requires that you manually configure file-notification services on the cloud and doesn’t let you backfill existing files. For details, see What is Auto Loader?.

On Databricks Runtime 7.0 you no longer need to request a custom Databricks Runtime image in order to use Auto Loader.
COPY INTO (Public Preview), which lets you load data into Delta Lake with idempotent retries, has been improved in Databricks Runtime 7.0

Released as a Public Preview in Databricks Runtime 6.4, the COPY INTO SQL command lets you load data into Delta Lake with idempotent retries. To load data into Delta Lake today you have to use Apache Spark DataFrame APIs. If there are failures during loads, you have to handle them effectively. The new COPY INTO command provides a familiar declarative interface to load data in SQL. The command keeps track of previously loaded files and you safely re-run it in case of failures. For details, see COPY INTO.

Improvements

Azure Synapse (formerly SQL Data Warehouse) connector supports the COPY statement.

The main benefit of COPY is that lower privileged users can write data to Azure Synapse without needing strict CONTROL permissions on Azure Synapse.
The %matplotlib inline magic command is no longer required to display Matplolib objects inline in notebook cells. They are always displayed inline by default.
Matplolib figures are now rendered with transparent=False, so that user-specified backgrounds are not lost. This behavior can be overridden by setting Spark configuration spark.databricks.workspace.matplotlib.transparent true.
When running Structured Streaming production jobs on High Concurrency mode clusters, restarts of a job would occasionally fail, because the previously running job wasn’t terminated properly. Databricks Runtime 6.3 introduced the ability to set the SQL configuration spark.sql.streaming.stopActiveRunOnRestart true on your cluster to ensure that the previous run stops. This configuration is set by default in Databricks Runtime 7.0.

Major library changes

Python packages

Major Python packages upgraded:

boto3 1.9.162 -> 1.12.0
matplotlib 3.0.3 -> 3.1.3
numpy 1.16.2 -> 1.18.1
pandas 0.24.2 -> 1.0.1
pip 19.0.3 -> 20.0.2
pyarrow 0.13.0 -> 0.15.1
psycopg2 2.7.6 -> 2.8.4
scikit-learn 0.20.3 -> 0.22.1
scipy 1.2.1 -> 1.4.1
seaborn 0.9.0 -> 0.10.0

Python packages removed:

boto (use boto3)
pycurl

Note

The Python environment in Databricks Runtime 7.0 uses Python 3.7, which is different from the installed Ubuntu system Python: /usr/bin/python and /usr/bin/python2 are linked to Python 2.7 and /usr/bin/python3 is linked to Python 3.6.

R packages

R packages added:

broom
highr
isoband
knitr
markdown
modelr
reprex
rmarkdown
rvest
selectr
tidyverse
tinytex
xfun

R packages removed:

abind
bitops
car
carData
doMC
gbm
h2o
littler
lme4
mapproj
maps
maptools
MatrixModels
minqa
mvtnorm
nloptr
openxlsx
pbkrtest
pkgKitten
quantreg
R.methodsS3
R.oo
R.utils
RcppEigen
RCurl
rio
sp
SparseM
statmod
zip

Java and Scala libraries

Apache Hive version used for handling Hive user-defined functions and Hive SerDes upgraded to 2.3.
Previously Azure Storage and Key Vault jars were packaged as part of Databricks Runtime, which would prevent you from using different versions of those libraries attached to clusters. Classes under com.microsoft.azure.storage and com.microsoft.azure.keyvault are no longer on the class path in Databricks Runtime. If you depend on either of those class paths, you must now attach Azure Storage SDK or Azure Key Vault SDK to your clusters.

Behavior changes

This section lists behavior changes from Databricks Runtime 6.6 to Databricks Runtime 7.0. You should be aware of these as you migrate workloads from lower Databricks Runtime releases to Databricks Runtime 7.0 and above.

Spark behavior changes

Because Databricks Runtime 7.0 is the first Databricks Runtime built on Spark 3.0, there are many changes that you should be aware of when you migrate workloads from Databricks Runtime 5.5 LTS or 6.x, which are built on Spark 2.4. These changes are listed in the “Behavior changes” section of each functional area in the Apache Spark section of this release notes article:

Behavior changes for Spark core, Spark SQL, and Structured Streaming
Behavior changes for MLlib
Behavior changes for SparkR

Other behavior changes

The upgrade to Scala 2.12 involves the following changes:
- Package cell serialization is handled differently. The following example illustrates the behavior change and how to handle it.
  
  Running foo.bar.MyObjectInPackageCell.run() as defined in the following package cell will trigger the error java.lang.NoClassDefFoundError: Could not initialize class foo.bar.MyObjectInPackageCell$
```
package foo.bar

case class MyIntStruct(int: Int)

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Column

object MyObjectInPackageCell extends Serializable {

  // Because SparkSession cannot be created in Spark executors,
  // the following line triggers the error
  // Could not initialize class foo.bar.MyObjectInPackageCell$
  val spark = SparkSession.builder.getOrCreate()

  def foo: Int => Option[MyIntStruct] = (x: Int) => Some(MyIntStruct(100))

  val theUDF = udf(foo)

  val df = {
    val myUDFInstance = theUDF(col("id"))
    spark.range(0, 1, 1, 1).withColumn("u", myUDFInstance)
  }

  def run(): Unit = {
    df.collect().foreach(println)
  }
}
```
  To work around this error, you can wrap MyObjectInPackageCell inside a serializable class.
- Certain cases using DataStreamWriter.foreachBatch will require a source code update. This change is due to the fact that Scala 2.12 has automatic conversion from lambda expressions to SAM types and can cause ambiguity.
  
  For example, the following Scala code can’t compile:
```
streams
  .writeStream
  .foreachBatch { (df, id) => myFunc(df, id) }
```
  To fix the compilation error, change foreachBatch { (df, id) => myFunc(df, id) } to foreachBatch(myFunc _) or use the Java API explicitly: foreachBatch(new VoidFunction2 ...).
Because the Apache Hive version used for handling Hive user-defined functions and Hive SerDes is upgraded to 2.3, two changes are required:
- Hive’s SerDe interface is replaced by an abstract class AbstractSerDe. For any custom Hive SerDe implementation, migrating to AbstractSerDe is required.
- Setting spark.sql.hive.metastore.jars to builtin means that the Hive 2.3 metastore client will be used to access metastores for Databricks Runtime 7.0. If you need to access Hive 1.2 based external metastores, set spark.sql.hive.metastore.jars to the folder that contains Hive 1.2 jars.

Deprecations and removals

Data skipping index was deprecated in Databricks Runtime 4.3 and removed in Databricks Runtime 7.0. We recommend that you use Delta tables instead, which offer improved data skipping capabilities.
In Databricks Runtime 7.0, the underlying version of Apache Spark uses Scala 2.12. Since libraries compiled against Scala 2.11 can disable Databricks Runtime 7.0 clusters in unexpected ways, clusters running Databricks Runtime 7.0 and above do not install libraries configured to be installed on all clusters. The cluster Libraries tab shows a status Skipped and a deprecation message that explains the changes in library handling. However, if you have a cluster that was created on an earlier version of Databricks Runtime before Azure Databricks platform version 3.20 was released to your workspace, and you now edit that cluster to use Databricks Runtime 7.0, any libraries that were configured to be installed on all clusters will be installed on that cluster. In this case, any incompatible JARs in the installed libraries can cause the cluster to be disabled. The workaround is either to clone the cluster or to create a new cluster.

Apache Spark

Databricks Runtime 7.0 includes Apache Spark 3.0.

Core, Spark SQL, Structured Streaming

Highlights

(Project Hydrogen) Accelerator-aware Scheduler (SPARK-24615)
Adaptive Query Execution (SPARK-31412)
Dynamic Partition Pruning (SPARK-11150)
Redesigned pandas UDF API with type hints (SPARK-28264)
Structured Streaming UI (SPARK-29543)
Catalog plugin API (SPARK-31121)
Better ANSI SQL compatibility

Performance enhancements

Adaptive Query Execution (SPARK-31412)
- Basic framework (SPARK-23128)
- Post shuffle partition number adjustment (SPARK-28177)
- Dynamic subquery reuse (SPARK-28753)
- Local shuffle reader (SPARK-28560)
- Skew join optimization (SPARK-29544)
- Optimize reading contiguous shuffle blocks (SPARK-9853)
Dynamic Partition Pruning (SPARK-11150)
Other optimizer rules
- Rule ReuseSubquery (SPARK-27279)
- Rule PushDownLeftSemiAntiJoin (SPARK-19712)
- Rule PushLeftSemiLeftAntiThroughJoin (SPARK-19712)
- Rule ReplaceNullWithFalse (SPARK-25860)
- Rule Eliminate sorts without limit in the subquery of Join/Aggregation (SPARK-29343)
- Rule PruneHiveTablePartitions (SPARK-15616)
- Pruning unnecessary nested fields from Generate (SPARK-27707)
- Rule RewriteNonCorrelatedExists (SPARK-29800)
Minimize table cache synchronization costs (SPARK-26917), (SPARK-26617), (SPARK-26548)
Split aggregation code into small functions (SPARK-21870)
Add batching in INSERT and ALTER TABLE ADD PARTITION command (SPARK-29938)

Extensibility enhancements

Catalog plugin API (SPARK-31121)
Data source V2 API refactoring (SPARK-25390)
Hive 3.0 and 3.1 metastore support (SPARK-27970),(SPARK-24360)
Extend Spark plugin interface to driver (SPARK-29396)
Extend Spark metrics system with user-defined metrics using executor plugins (SPARK-28091)
Developer APIs for extended Columnar Processing Support (SPARK-27396)
Built-in source migration using DSV2: parquet, ORC, CSV, JSON, Kafka, Text, Avro (SPARK-27589)
Allow FunctionInjection in SparkExtensions (SPARK-25560)
Allows Aggregator to be registered as a UDAF (SPARK-27296)

Connector enhancements

Column pruning through nondeterministic expressions (SPARK-29768)
Support spark.sql.statistics.fallBackToHdfs in data source tables (SPARK-25474)
Allow partition pruning with subquery filters on file source (SPARK-26893)
Avoid pushdown of subqueries in data source filters (SPARK-25482)
Recursive data loading from file sources (SPARK-27990)
Parquet/ORC
- Pushdown of disjunctive predicates (SPARK-27699)
- Generalize Nested Column Pruning (SPARK-25603) and turned on by default (SPARK-29805)
- Parquet only
  - Parquet predicate pushdown for nested fields (SPARK-17636)
- ORC only
  - Support merge schema for ORC (SPARK-11412)
  - Nested schema pruning for ORC (SPARK-27034)
  - Predicate conversion complexity reduction for ORC (SPARK-27105, SPARK-28108)
  - Upgrade Apache ORC to 1.5.9 (SPARK-30695)
CSV
- Support filters pushdown in CSV datasource (SPARK-30323)
Hive SerDe
- No schema inference when reading Hive serde table with native data source (SPARK-27119)
- Hive CTAS commands should use data source if it is convertible (SPARK-25271)
- Use native data source to optimize inserting partitioned Hive table (SPARK-28573)
Apache Kafka
- Add support for Kafka headers (SPARK-23539)
- Add Kafka delegation token support (SPARK-25501)
- Introduce new option to Kafka source: offset by timestamp (starting/ending) (SPARK-26848)
- Support the minPartitions option in Kafka batch source and streaming source v1 (SPARK-30656)
- Upgrade Kafka to 2.4.1 (SPARK-31126)
New built-in data sources
- New built-in binary file data sources (SPARK-25348)
- New no-op batch data sources (SPARK-26550) and no-op streaming sink (SPARK-26649)

Feature enhancements

[Hydrogen] Accelerator-aware Scheduler (SPARK-24615)
Introduce a complete set of Join Hints (SPARK-27225)
Add PARTITION BY hint for SQL queries (SPARK-28746)
Metadata Handling in Thrift Server (SPARK-28426)
Add higher order functions to scala API (SPARK-27297)
Support simple all gather in barrier task context (SPARK-30667)
Hive UDFs supports the UDT type (SPARK-28158)
Support DELETE/UPDATE/MERGE Operators in Catalyst (SPARK-28351, SPARK-28892, SPARK-28893)
Implement DataFrame.tail (SPARK-30185)
New built-in functions
- sinh, cosh, tanh, asinh, acosh, atanh (SPARK-28133)
- any, every, some (SPARK-19851)
- bit_and, bit_or (SPARK-27879)
- bit_count (SPARK-29491)
- bit_xor (SPARK-29545)
- bool_and, bool_or (SPARK-30184)
- count_if (SPARK-27425)
- date_part (SPARK-28690)
- extract (SPARK-23903)
- forall (SPARK-27905)
- from_csv (SPARK-25393)
- make_date (SPARK-28432)
- make_interval (SPARK-29393)
- make_timestamp (SPARK-28459)
- map_entries (SPARK-23935)
- map_filter (SPARK-23937)
- map_zip_with (SPARK-23938)
- max_by, min_by (SPARK-27653)
- schema_of_csv (SPARK-25672)
- to_csv (SPARK-25638)
- transform_keys (SPARK-23939)
- transform_values (SPARK-23940)
- typeof (SPARK-29961)
- version (SPARK-29554)
- xxhash64 (SPARK-27099)
Improvements on existing built-in functions
- Built-in date-time functions/operations improvement (SPARK-31415)
- Support FAILFAST mode for from_json (SPARK-25243)
- array_sort adds a new comparator parameter (SPARK-29020)
- Filter can now take the index as input as well as the element (SPARK-28962)

SQL compatibility enhancements

Switch to Proleptic Gregorian calendar (SPARK-26651)
Build Spark’s own datetime pattern definition (SPARK-31408)
Introduce ANSI store assignment policy for table insertion (SPARK-28495)
Follow ANSI store assignment rule in table insertion by default (SPARK-28885)
Add a SQLConf spark.sql.ansi.enabled (SPARK-28989)
Support ANSI SQL filter clause for aggregate expression (SPARK-27986)
Support ANSI SQL OVERLAY function (SPARK-28077)
Support ANSI nested bracketed comments (SPARK-28880)
Throw exception on overflow for integers (SPARK-26218)
Overflow check for interval arithmetic operations (SPARK-30341)
Throw Exception when invalid string is cast to numeric type (SPARK-30292)
Make interval multiply and divide’s overflow behavior consistent with other operations (SPARK-30919)
Add ANSI type aliases for char and decimal (SPARK-29941)
SQL Parser defines ANSI compliant reserved keywords (SPARK-26215)
Forbid reserved keywords as identifiers when ANSI mode is on (SPARK-26976)
Support ANSI SQL LIKE ... ESCAPE syntax (SPARK-28083)
Support ANSI SQL Boolean-Predicate syntax (SPARK-27924)
Better support for correlated subquery processing (SPARK-18455)

Monitoring and debugability enhancements

New Structured Streaming UI (SPARK-29543)
SHS: Allow event logs for running streaming apps to be rolled over (SPARK-28594)
Add an API that allows a user to define and observe arbitrary metrics on batch and streaming queries (SPARK-29345)
Instrumentation for tracking per-query planning time (SPARK-26129)
Put the basic shuffle metrics in the SQL exchange operator (SPARK-26139)
SQL statement is shown in SQL Tab instead of callsite (SPARK-27045)
Add tooltip to SparkUI (SPARK-29449)
Improve the concurrent performance of History Server (SPARK-29043)
EXPLAIN FORMATTED command (SPARK-27395)
Support Dumping truncated plans and generated code to a file (SPARK-26023)
Enhance describe framework to describe the output of a query (SPARK-26982)
Add SHOW VIEWS command (SPARK-31113)
Improve the error messages of SQL parser (SPARK-27901)
Support Prometheus monitoring natively (SPARK-29429)

PySpark enhancements

Redesigned pandas UDFs with type hints (SPARK-28264)
Pandas UDF pipeline (SPARK-26412)
Support StructType as arguments and return types for Scalar Pandas UDF (SPARK-27240 )
Support Dataframe Cogroup via Pandas UDFs (SPARK-27463)
Add mapInPandas to allow an iterator of DataFrames (SPARK-28198)
Certain SQL functions should take column names as well (SPARK-26979)
Make PySpark SQL exceptions more Pythonic (SPARK-31849)

Documentation and test coverage enhancements

Build a SQL Reference (SPARK-28588)
Build a user guide for WebUI (SPARK-28372)
Build a page for SQL configuration documentation (SPARK-30510)
Add version information for Spark configuration (SPARK-30839)
Port regression tests from PostgreSQL (SPARK-27763)
Thrift-server test coverage (SPARK-28608)
Test coverage of UDFs (python UDF, pandas UDF, scala UDF) (SPARK-27921)

Other notable changes

Built-in Hive execution upgrade from 1.2.1 to 2.3.6 (SPARK-23710, SPARK-28723, SPARK-31381)
Use Apache Hive 2.3 dependency by default (SPARK-30034)
GA Scala 2.12 and remove 2.11 (SPARK-26132)
Improve logic for timing out executors in dynamic allocation (SPARK-20286)
Disk-persisted RDD blocks served by shuffle service and ignored for Dynamic Allocation (SPARK-27677)
Acquire new executors to avoid hang because of blocklisting (SPARK-22148)
Allow sharing of Netty’s memory pool allocators (SPARK-24920)
Fix deadlock between TaskMemoryManager and UnsafeExternalSorter$SpillableIterator (SPARK-27338)
Introduce AdmissionControl APIs for StructuredStreaming (SPARK-30669)
Spark History Main page performance improvement (SPARK-25973)
Speed up and slim down metric aggregation in SQL listener (SPARK-29562)
Avoid the network when shuffle blocks are fetched from the same host (SPARK-27651)
Improve file listing for DistributedFileSystem (SPARK-27801)

Behavior changes for Spark core, Spark SQL, and Structured Streaming

The following migration guides list behavior changes between Apache Spark 2.4 and 3.0. These changes may require updates to jobs that you have been running on lower Databricks Runtime versions:

The following behavior changes are not covered in these migration guides:

In Spark 3.0, the deprecated class org.apache.spark.sql.streaming.ProcessingTime has been removed. Use org.apache.spark.sql.streaming.Trigger.ProcessingTime instead. Likewise, org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger has been removed in favor of Trigger.Continuous, and org.apache.spark.sql.execution.streaming.OneTimeTrigger has been hidden in favor of Trigger.Once. (SPARK-28199)
In Databricks Runtime 7.0, when reading a Hive SerDe table, by default Spark disallows reading files under a subdirectory that is not a table partition. To enable it, set the configuration spark.databricks.io.hive.scanNonpartitionedDirectory.enabled as true. This does not affect Spark native table readers and file readers.

MLlib

Highlights

Multiple columns support was added to Binarizer (SPARK-23578), StringIndexer (SPARK-11215), StopWordsRemover (SPARK-29808) and PySpark QuantileDiscretizer (SPARK-22796)
Support tree-based feature transformation(SPARK-13677)
Two new evaluators MultilabelClassificationEvaluator (SPARK-16692) and RankingEvaluator (SPARK-28045) were added
Sample weights support was added in DecisionTreeClassifier/Regressor (SPARK-19591), RandomForestClassifier/Regressor (SPARK-9478), GBTClassifier/Regressor (SPARK-9612), RegressionEvaluator (SPARK-24102), BinaryClassificationEvaluator (SPARK-24103), BisectingKMeans (SPARK-30351), KMeans (SPARK-29967) and GaussianMixture (SPARK-30102)
R API for PowerIterationClustering was added (SPARK-19827)
Added Spark ML listener for tracking ML pipeline status (SPARK-23674)
Fit with validation set was added to Gradient Boosted Trees in Python (SPARK-24333)
RobustScaler transformer was added (SPARK-28399)
Factorization Machines classifier and regressor were added (SPARK-29224)
Gaussian Naive Bayes (SPARK-16872) and Complement Naive Bayes (SPARK-29942) were added
ML function parity between Scala and Python (SPARK-28958)
predictRaw is made public in all the Classification models. predictProbability is made public in all of the Classification models except LinearSVCModel (SPARK-30358)

Behavior changes for MLlib

The following migration guide lists behavior changes between Apache Spark 2.4 and 3.0. These changes may require updates to jobs that you have been running on lower Databricks Runtime versions:

Migration Guide: MLlib (Machine Learning)

The following behavior changes are not covered in the migration guide:

In Spark 3.0, a multiclass logistic regression in Pyspark will now (correctly) return LogisticRegressionSummary, not the subclass BinaryLogisticRegressionSummary. The additional methods exposed by BinaryLogisticRegressionSummary would not work in this case anyway. (SPARK-31681)
In Spark 3.0, pyspark.ml.param.shared.Has* mixins do not provide any set*(self, value) setter methods anymore, use the respective self.set(self.*, value) instead. See SPARK-29093 for details. (SPARK-29093)

SparkR

Arrow optimization in SparkR’s interoperability (SPARK-26759)
Performance enhancement via vectorized R gapply(), dapply(), createDataFrame, collect()
“Eager execution” for R shell, IDE (SPARK-24572)
R API for Power Iteration Clustering (SPARK-19827)

Behavior changes for SparkR

The following migration guide lists behavior changes between Apache Spark 2.4 and 3.0. These changes may require updates to jobs that you have been running on lower Databricks Runtime versions:

Migration Guide: SparkR (R on Spark)

Deprecations

Deprecate Python 2 support (SPARK-27884)
Deprecate R < 3.4 support (SPARK-26014)

Known issues

Parsing day of year using pattern letter ‘D’ returns the wrong result if the year field is missing. This can happen in SQL functions like to_timestamp which parses datetime string to datetime values using a pattern string. (SPARK-31939)
Join/Window/Aggregate inside subqueries may lead to wrong results if the keys have values -0.0 and 0.0. (SPARK-31958)
A window query may fail with ambiguous self-join error unexpectedly. (SPARK-31956)
Streaming queries with dropDuplicates operator may not be able to restart with the checkpoint written by Spark 2.x. (SPARK-31990)

Maintenance updates

See Databricks Runtime 7.0 maintenance updates.

System environment

Operating System: Ubuntu 18.04.4 LTS
Java: 1.8.0_252
Scala: 2.12.10
Python: 3.7.5
R: R version 3.6.3 (2020-02-29)
Delta Lake 0.7.0

Installed Python libraries

Library	Version	Library	Version	Library	Version
asn1crypto	1.3.0	backcall	0.1.0	boto3	1.12.0
botocore	1.15.0	certifi	2020.4.5	cffi	1.14.0
chardet	3.0.4	cryptography	2.8	cycler	0.10.0
Cython	0.29.15	decorator	4.4.1	docutils	0.15.2
entrypoints	0.3	idna	2.8	ipykernel	5.1.4
ipython	7.12.0	ipython-genutils	0.2.0	jedi	0.14.1
jmespath	0.9.4	joblib	0.14.1	jupyter-client	5.3.4
jupyter-core	4.6.1	kiwisolver	1.1.0	matplotlib	3.1.3
numpy	1.18.1	pandas	1.0.1	parso	0.5.2
patsy	0.5.1	pexpect	4.8.0	pickleshare	0.7.5
pip	20.0.2	prompt-toolkit	3.0.3	psycopg2	2.8.4
ptyprocess	0.6.0	pyarrow	0.15.1	pycparser	2.19
Pygments	2.5.2	PyGObject	3.26.1	pyOpenSSL	19.1.0
pyparsing	2.4.6	PySocks	1.7.1	python-apt	1.6.5+ubuntu0.3
python-dateutil	2.8.1	pytz	2019.3	pyzmq	18.1.1
requests	2.22.0	s3transfer	0.3.3	scikit-learn	0.22.1
scipy	1.4.1	seaborn	0.10.0	setuptools	45.2.0
six	1.14.0	ssh-import-id	5.7	statsmodels	0.11.0
tornado	6.0.3	traitlets	4.3.3	unattended-upgrades	0.1
urllib3	1.25.8	virtualenv	16.7.10	wcwidth	0.1.8
wheel	0.34.2

Installed R libraries

R libraries are installed from Microsoft CRAN snapshot on 2020-04-22.

Library	Version	Library	Version	Library	Version
askpass	1.1	assertthat	0.2.1	backports	1.1.6
base	3.6.3	base64enc	0.1-3	BH	1.72.0-3
bit	1.1-15.2	bit64	0.9-7	blob	1.2.1
boot	1.3-25	brew	1.0-6	broom	0.5.6
callr	3.4.3	caret	6.0-86	cellranger	1.1.0
chron	2.3-55	class	7.3-17	cli	2.0.2
clipr	0.7.0	cluster	2.1.0	codetools	0.2-16
colorspace	1.4-1	commonmark	1.7	compiler	3.6.3
config	0.3	covr	3.5.0	crayon	1.3.4
crosstalk	1.1.0.1	curl	4.3	data.table	1.12.8
datasets	3.6.3	DBI	1.1.0	dbplyr	1.4.3
desc	1.2.0	devtools	2.3.0	digest	0.6.25
dplyr	0.8.5	DT	0.13	ellipsis	0.3.0
evaluate	0.14	fansi	0.4.1	farver	2.0.3
fastmap	1.0.1	forcats	0.5.0	foreach	1.5.0
foreign	0.8-76	forge	0.2.0	fs	1.4.1
generics	0.0.2	ggplot2	3.3.0	gh	1.1.0
git2r	0.26.1	glmnet	3.0-2	globals	0.12.5
glue	1.4.0	gower	0.2.1	graphics	3.6.3
grDevices	3.6.3	grid	3.6.3	gridExtra	2.3
gsubfn	0.7	gtable	0.3.0	haven	2.2.0
highr	0.8	hms	0.5.3	htmltools	0.4.0
htmlwidgets	1.5.1	httpuv	1.5.2	httr	1.4.1
hwriter	1.3.2	hwriterPlus	1.0-3	ini	0.3.1
ipred	0.9-9	isoband	0.2.1	iterators	1.0.12
jsonlite	1.6.1	KernSmooth	2.23-17	knitr	1.28
labeling	0.3	later	1.0.0	lattice	0.20-41
lava	1.6.7	lazyeval	0.2.2	lifecycle	0.2.0
lubridate	1.7.8	magrittr	1.5	markdown	1.1
MASS	7.3-51.6	Matrix	1.2-18	memoise	1.1.0
methods	3.6.3	mgcv	1.8-31	mime	0.9
ModelMetrics	1.2.2.2	modelr	0.1.6	munsell	0.5.0
nlme	3.1-147	nnet	7.3-14	numDeriv	2016.8-1.1
openssl	1.4.1	parallel	3.6.3	pillar	1.4.3
pkgbuild	1.0.6	pkgconfig	2.0.3	pkgload	1.0.2
plogr	0.2.0	plyr	1.8.6	praise	1.0.0
prettyunits	1.1.1	pROC	1.16.2	processx	3.4.2
prodlim	2019.11.13	progress	1.2.2	promises	1.1.0
proto	1.0.0	ps	1.3.2	purrr	0.3.4
r2d3	0.2.3	R6	2.4.1	randomForest	4.6-14
rappdirs	0.3.1	rcmdcheck	1.3.3	RColorBrewer	1.1-2
Rcpp	1.0.4.6	readr	1.3.1	readxl	1.3.1
recipes	0.1.10	rematch	1.0.1	rematch2	2.1.1
remotes	2.1.1	reprex	0.3.0	reshape2	1.4.4
rex	1.2.0	rjson	0.2.20	rlang	0.4.5
rmarkdown	2.1	RODBC	1.3-16	roxygen2	7.1.0
rpart	4.1-15	rprojroot	1.3-2	Rserve	1.8-6
RSQLite	2.2.0	rstudioapi	0.11	rversions	2.0.1
rvest	0.3.5	scales	1.1.0	selectr	0.4-2
sessioninfo	1.1.1	shape	1.4.4	shiny	1.4.0.2
sourcetools	0.1.7	sparklyr	1.2.0	SparkR	3.0.0
spatial	7.3-11	splines	3.6.3	sqldf	0.4-11
SQUAREM	2020.2	stats	3.6.3	stats4	3.6.3
stringi	1.4.6	stringr	1.4.0	survival	3.1-12
sys	3.3	tcltk	3.6.3	TeachingDemos	2.10
testthat	2.3.2	tibble	3.0.1	tidyr	1.0.2
tidyselect	1.0.0	tidyverse	1.3.0	timeDate	3043.102
tinytex	0.22	tools	3.6.3	usethis	1.6.0
utf8	1.1.4	utils	3.6.3	vctrs	0.2.4
viridisLite	0.3.0	whisker	0.4	withr	2.2.0
xfun	0.13	xml2	1.3.1	xopen	1.0.0
xtable	1.8-4	yaml	2.2.1

Installed Java and Scala libraries (Scala 2.12 cluster version)

Group ID	Artifact ID	Version
antlr	antlr	2.7.7
com.amazonaws	amazon-kinesis-client	1.12.0
com.amazonaws	aws-java-sdk-autoscaling	1.11.655
com.amazonaws	aws-java-sdk-cloudformation	1.11.655
com.amazonaws	aws-java-sdk-cloudfront	1.11.655
com.amazonaws	aws-java-sdk-cloudhsm	1.11.655
com.amazonaws	aws-java-sdk-cloudsearch	1.11.655
com.amazonaws	aws-java-sdk-cloudtrail	1.11.655
com.amazonaws	aws-java-sdk-cloudwatch	1.11.655
com.amazonaws	aws-java-sdk-cloudwatchmetrics	1.11.655
com.amazonaws	aws-java-sdk-codedeploy	1.11.655
com.amazonaws	aws-java-sdk-cognitoidentity	1.11.655
com.amazonaws	aws-java-sdk-cognitosync	1.11.655
com.amazonaws	aws-java-sdk-config	1.11.655
com.amazonaws	aws-java-sdk-core	1.11.655
com.amazonaws	aws-java-sdk-datapipeline	1.11.655
com.amazonaws	aws-java-sdk-directconnect	1.11.655
com.amazonaws	aws-java-sdk-directory	1.11.655
com.amazonaws	aws-java-sdk-dynamodb	1.11.655
com.amazonaws	aws-java-sdk-ec2	1.11.655
com.amazonaws	aws-java-sdk-ecs	1.11.655
com.amazonaws	aws-java-sdk-efs	1.11.655
com.amazonaws	aws-java-sdk-elasticache	1.11.655
com.amazonaws	aws-java-sdk-elasticbeanstalk	1.11.655
com.amazonaws	aws-java-sdk-elasticloadbalancing	1.11.655
com.amazonaws	aws-java-sdk-elastictranscoder	1.11.655
com.amazonaws	aws-java-sdk-emr	1.11.655
com.amazonaws	aws-java-sdk-glacier	1.11.655
com.amazonaws	aws-java-sdk-iam	1.11.655
com.amazonaws	aws-java-sdk-importexport	1.11.655
com.amazonaws	aws-java-sdk-kinesis	1.11.655
com.amazonaws	aws-java-sdk-kms	1.11.655
com.amazonaws	aws-java-sdk-lambda	1.11.655
com.amazonaws	aws-java-sdk-logs	1.11.655
com.amazonaws	aws-java-sdk-machinelearning	1.11.655
com.amazonaws	aws-java-sdk-opsworks	1.11.655
com.amazonaws	aws-java-sdk-rds	1.11.655
com.amazonaws	aws-java-sdk-redshift	1.11.655
com.amazonaws	aws-java-sdk-route53	1.11.655
com.amazonaws	aws-java-sdk-s3	1.11.655
com.amazonaws	aws-java-sdk-ses	1.11.655
com.amazonaws	aws-java-sdk-simpledb	1.11.655
com.amazonaws	aws-java-sdk-simpleworkflow	1.11.655
com.amazonaws	aws-java-sdk-sns	1.11.655
com.amazonaws	aws-java-sdk-sqs	1.11.655
com.amazonaws	aws-java-sdk-ssm	1.11.655
com.amazonaws	aws-java-sdk-storagegateway	1.11.655
com.amazonaws	aws-java-sdk-sts	1.11.655
com.amazonaws	aws-java-sdk-support	1.11.655
com.amazonaws	aws-java-sdk-swf-libraries	1.11.22
com.amazonaws	aws-java-sdk-workspaces	1.11.655
com.amazonaws	jmespath-java	1.11.655
com.chuusai	shapeless_2.12	2.3.3
com.clearspring.analytics	stream	2.9.6
com.databricks	Rserve	1.8-3
com.databricks	jets3t	0.7.1-0
com.databricks.scalapb	compilerplugin_2.12	0.4.15-10
com.databricks.scalapb	scalapb-runtime_2.12	0.4.15-10
com.esotericsoftware	kryo-shaded	4.0.2
com.esotericsoftware	minlog	1.3.0
com.fasterxml	classmate	1.3.4
com.fasterxml.jackson.core	jackson-annotations	2.10.0
com.fasterxml.jackson.core	jackson-core	2.10.0
com.fasterxml.jackson.core	jackson-databind	2.10.0
com.fasterxml.jackson.dataformat	jackson-dataformat-cbor	2.10.0
com.fasterxml.jackson.datatype	jackson-datatype-joda	2.10.0
com.fasterxml.jackson.module	jackson-module-paranamer	2.10.0
com.fasterxml.jackson.module	jackson-module-scala_2.12	2.10.0
com.github.ben-manes.caffeine	caffeine	2.3.4
com.github.fommil	jniloader	1.1
com.github.fommil.netlib	core	1.1.2
com.github.fommil.netlib	native_ref-java	1.1
com.github.fommil.netlib	native_ref-java-natives	1.1
com.github.fommil.netlib	native_system-java	1.1
com.github.fommil.netlib	native_system-java-natives	1.1
com.github.fommil.netlib	netlib-native_ref-linux-x86_64-natives	1.1
com.github.fommil.netlib	netlib-native_system-linux-x86_64-natives	1.1
com.github.joshelser	dropwizard-metrics-hadoop-metrics2-reporter	0.1.2
com.github.luben	zstd-jni	1.4.4-3
com.github.wendykierp	JTransforms	3.1
com.google.code.findbugs	jsr305	3.0.0
com.google.code.gson	gson	2.2.4
com.google.flatbuffers	flatbuffers-java	1.9.0
com.google.guava	guava	15.0
com.google.protobuf	protobuf-java	2.6.1
com.h2database	h2	1.4.195
com.helger	profiler	1.1.1
com.jcraft	jsch	0.1.50
com.jolbox	bonecp	0.8.0.RELEASE
com.microsoft.azure	azure-data-lake-store-sdk	2.2.8
com.microsoft.sqlserver	mssql-jdbc	8.2.1.jre8
com.ning	compress-lzf	1.0.3
com.sun.mail	javax.mail	1.5.2
com.tdunning	json	1.8
com.thoughtworks.paranamer	paranamer	2.8
com.trueaccord.lenses	lenses_2.12	0.4.12
com.twitter	chill-java	0.9.5
com.twitter	chill_2.12	0.9.5
com.twitter	util-app_2.12	7.1.0
com.twitter	util-core_2.12	7.1.0
com.twitter	util-function_2.12	7.1.0
com.twitter	util-jvm_2.12	7.1.0
com.twitter	util-lint_2.12	7.1.0
com.twitter	util-registry_2.12	7.1.0
com.twitter	util-stats_2.12	7.1.0
com.typesafe	config	1.2.1
com.typesafe.scala-logging	scala-logging_2.12	3.7.2
com.univocity	univocity-parsers	2.8.3
com.zaxxer	HikariCP	3.1.0
commons-beanutils	commons-beanutils	1.9.4
commons-cli	commons-cli	1.2
commons-codec	commons-codec	1.10
commons-collections	commons-collections	3.2.2
commons-configuration	commons-configuration	1.6
commons-dbcp	commons-dbcp	1.4
commons-digester	commons-digester	1.8
commons-fileupload	commons-fileupload	1.3.3
commons-httpclient	commons-httpclient	3.1
commons-io	commons-io	2.4
commons-lang	commons-lang	2.6
commons-logging	commons-logging	1.1.3
commons-net	commons-net	3.1
commons-pool	commons-pool	1.5.4
info.ganglia.gmetric4j	gmetric4j	1.0.10
io.airlift	aircompressor	0.10
io.dropwizard.metrics	metrics-core	4.1.1
io.dropwizard.metrics	metrics-graphite	4.1.1
io.dropwizard.metrics	metrics-healthchecks	4.1.1
io.dropwizard.metrics	metrics-jetty9	4.1.1
io.dropwizard.metrics	metrics-jmx	4.1.1
io.dropwizard.metrics	metrics-json	4.1.1
io.dropwizard.metrics	metrics-jvm	4.1.1
io.dropwizard.metrics	metrics-servlets	4.1.1
io.netty	netty-all	4.1.47.Final
jakarta.annotation	jakarta.annotation-api	1.3.5
jakarta.validation	jakarta.validation-api	2.0.2
jakarta.ws.rs	jakarta.ws.rs-api	2.1.6
javax.activation	activation	1.1.1
javax.el	javax.el-api	2.2.4
javax.jdo	jdo-api	3.0.1
javax.servlet	javax.servlet-api	3.1.0
javax.servlet.jsp	jsp-api	2.1
javax.transaction	jta	1.1
javax.transaction	transaction-api	1.1
javax.xml.bind	jaxb-api	2.2.2
javax.xml.stream	stax-api	1.0-2
javolution	javolution	5.5.1
jline	jline	2.14.6
joda-time	joda-time	2.10.5
log4j	apache-log4j-extras	1.2.17
log4j	log4j	1.2.17
net.razorvine	pyrolite	4.30
net.sf.jpam	jpam	1.1
net.sf.opencsv	opencsv	2.3
net.sf.supercsv	super-csv	2.2.0
net.snowflake	snowflake-ingest-sdk	0.9.6
net.snowflake	snowflake-jdbc	3.12.0
net.snowflake	spark-snowflake_2.12	2.5.9-spark_2.4
net.sourceforge.f2j	arpack_combined_all	0.1
org.acplt.remotetea	remotetea-oncrpc	1.1.2
org.antlr	ST4	4.0.4
org.antlr	antlr-runtime	3.5.2
org.antlr	antlr4-runtime	4.7.1
org.antlr	stringtemplate	3.2.1
org.apache.ant	ant	1.9.2
org.apache.ant	ant-jsch	1.9.2
org.apache.ant	ant-launcher	1.9.2
org.apache.arrow	arrow-format	0.15.1
org.apache.arrow	arrow-memory	0.15.1
org.apache.arrow	arrow-vector	0.15.1
org.apache.avro	avro	1.8.2
org.apache.avro	avro-ipc	1.8.2
org.apache.avro	avro-mapred-hadoop2	1.8.2
org.apache.commons	commons-compress	1.8.1
org.apache.commons	commons-crypto	1.0.0
org.apache.commons	commons-lang3	3.9
org.apache.commons	commons-math3	3.4.1
org.apache.commons	commons-text	1.6
org.apache.curator	curator-client	2.7.1
org.apache.curator	curator-framework	2.7.1
org.apache.curator	curator-recipes	2.7.1
org.apache.derby	derby	10.12.1.1
org.apache.directory.api	api-asn1-api	1.0.0-M20
org.apache.directory.api	api-util	1.0.0-M20
org.apache.directory.server	apacheds-i18n	2.0.0-M15
org.apache.directory.server	apacheds-kerberos-codec	2.0.0-M15
org.apache.hadoop	hadoop-annotations	2.7.4
org.apache.hadoop	hadoop-auth	2.7.4
org.apache.hadoop	hadoop-client	2.7.4
org.apache.hadoop	hadoop-common	2.7.4
org.apache.hadoop	hadoop-hdfs	2.7.4
org.apache.hadoop	hadoop-mapreduce-client-app	2.7.4
org.apache.hadoop	hadoop-mapreduce-client-common	2.7.4
org.apache.hadoop	hadoop-mapreduce-client-core	2.7.4
org.apache.hadoop	hadoop-mapreduce-client-jobclient	2.7.4
org.apache.hadoop	hadoop-mapreduce-client-shuffle	2.7.4
org.apache.hadoop	hadoop-yarn-api	2.7.4
org.apache.hadoop	hadoop-yarn-client	2.7.4
org.apache.hadoop	hadoop-yarn-common	2.7.4
org.apache.hadoop	hadoop-yarn-server-common	2.7.4
org.apache.hive	hive-beeline	2.3.7
org.apache.hive	hive-cli	2.3.7
org.apache.hive	hive-common	2.3.7
org.apache.hive	hive-exec-core	2.3.7
org.apache.hive	hive-jdbc	2.3.7
org.apache.hive	hive-llap-client	2.3.7
org.apache.hive	hive-llap-common	2.3.7
org.apache.hive	hive-metastore	2.3.7
org.apache.hive	hive-serde	2.3.7
org.apache.hive	hive-shims	2.3.7
org.apache.hive	hive-storage-api	2.7.1
org.apache.hive	hive-vector-code-gen	2.3.7
org.apache.hive.shims	hive-shims-0.23	2.3.7
org.apache.hive.shims	hive-shims-common	2.3.7
org.apache.hive.shims	hive-shims-scheduler	2.3.7
org.apache.htrace	htrace-core	3.1.0-incubating
org.apache.httpcomponents	httpclient	4.5.6
org.apache.httpcomponents	httpcore	4.4.12
org.apache.ivy	ivy	2.4.0
org.apache.orc	orc-core	1.5.10
org.apache.orc	orc-mapreduce	1.5.10
org.apache.orc	orc-shims	1.5.10
org.apache.parquet	parquet-column	1.10.1.2-databricks4
org.apache.parquet	parquet-common	1.10.1.2-databricks4
org.apache.parquet	parquet-encoding	1.10.1.2-databricks4
org.apache.parquet	parquet-format	2.4.0
org.apache.parquet	parquet-hadoop	1.10.1.2-databricks4
org.apache.parquet	parquet-jackson	1.10.1.2-databricks4
org.apache.thrift	libfb303	0.9.3
org.apache.thrift	libthrift	0.12.0
org.apache.velocity	velocity	1.5
org.apache.xbean	xbean-asm7-shaded	4.15
org.apache.yetus	audience-annotations	0.5.0
org.apache.zookeeper	zookeeper	3.4.14
org.codehaus.jackson	jackson-core-asl	1.9.13
org.codehaus.jackson	jackson-jaxrs	1.9.13
org.codehaus.jackson	jackson-mapper-asl	1.9.13
org.codehaus.jackson	jackson-xc	1.9.13
org.codehaus.janino	commons-compiler	3.0.16
org.codehaus.janino	janino	3.0.16
org.datanucleus	datanucleus-api-jdo	4.2.4
org.datanucleus	datanucleus-core	4.1.17
org.datanucleus	datanucleus-rdbms	4.1.19
org.datanucleus	javax.jdo	3.2.0-m3
org.eclipse.jetty	jetty-client	9.4.18.v20190429
org.eclipse.jetty	jetty-continuation	9.4.18.v20190429
org.eclipse.jetty	jetty-http	9.4.18.v20190429
org.eclipse.jetty	jetty-io	9.4.18.v20190429
org.eclipse.jetty	jetty-jndi	9.4.18.v20190429
org.eclipse.jetty	jetty-plus	9.4.18.v20190429
org.eclipse.jetty	jetty-proxy	9.4.18.v20190429
org.eclipse.jetty	jetty-security	9.4.18.v20190429
org.eclipse.jetty	jetty-server	9.4.18.v20190429
org.eclipse.jetty	jetty-servlet	9.4.18.v20190429
org.eclipse.jetty	jetty-servlets	9.4.18.v20190429
org.eclipse.jetty	jetty-util	9.4.18.v20190429
org.eclipse.jetty	jetty-webapp	9.4.18.v20190429
org.eclipse.jetty	jetty-xml	9.4.18.v20190429
org.fusesource.leveldbjni	leveldbjni-all	1.8
org.glassfish.hk2	hk2-api	2.6.1
org.glassfish.hk2	hk2-locator	2.6.1
org.glassfish.hk2	hk2-utils	2.6.1
org.glassfish.hk2	osgi-resource-locator	1.0.3
org.glassfish.hk2.external	aopalliance-repackaged	2.6.1
org.glassfish.hk2.external	jakarta.inject	2.6.1
org.glassfish.jersey.containers	jersey-container-servlet	2.30
org.glassfish.jersey.containers	jersey-container-servlet-core	2.30
org.glassfish.jersey.core	jersey-client	2.30
org.glassfish.jersey.core	jersey-common	2.30
org.glassfish.jersey.core	jersey-server	2.30
org.glassfish.jersey.inject	jersey-hk2	2.30
org.glassfish.jersey.media	jersey-media-jaxb	2.30
org.hibernate.validator	hibernate-validator	6.1.0.Final
org.javassist	javassist	3.25.0-GA
org.jboss.logging	jboss-logging	3.3.2.Final
org.jdbi	jdbi	2.63.1
org.joda	joda-convert	1.7
org.jodd	jodd-core	3.5.2
org.json4s	json4s-ast_2.12	3.6.6
org.json4s	json4s-core_2.12	3.6.6
org.json4s	json4s-jackson_2.12	3.6.6
org.json4s	json4s-scalap_2.12	3.6.6
org.lz4	lz4-java	1.7.1
org.mariadb.jdbc	mariadb-java-client	2.1.2
org.objenesis	objenesis	2.5.1
org.postgresql	postgresql	42.1.4
org.roaringbitmap	RoaringBitmap	0.7.45
org.roaringbitmap	shims	0.7.45
org.rocksdb	rocksdbjni	6.2.2
org.rosuda.REngine	REngine	2.1.0
org.scala-lang	scala-compiler_2.12	2.12.10
org.scala-lang	scala-library_2.12	2.12.10
org.scala-lang	scala-reflect_2.12	2.12.10
org.scala-lang.modules	scala-collection-compat_2.12	2.1.1
org.scala-lang.modules	scala-parser-combinators_2.12	1.1.2
org.scala-lang.modules	scala-xml_2.12	1.2.0
org.scala-sbt	test-interface	1.0
org.scalacheck	scalacheck_2.12	1.14.2
org.scalactic	scalactic_2.12	3.0.8
org.scalanlp	breeze-macros_2.12	1.0
org.scalanlp	breeze_2.12	1.0
org.scalatest	scalatest_2.12	3.0.8
org.slf4j	jcl-over-slf4j	1.7.30
org.slf4j	jul-to-slf4j	1.7.30
org.slf4j	slf4j-api	1.7.30
org.slf4j	slf4j-log4j12	1.7.30
org.spark-project.spark	unused	1.0.0
org.springframework	spring-core	4.1.4.RELEASE
org.springframework	spring-test	4.1.4.RELEASE
org.threeten	threeten-extra	1.5.0
org.tukaani	xz	1.5
org.typelevel	algebra_2.12	2.0.0-M2
org.typelevel	cats-kernel_2.12	2.0.0-M4
org.typelevel	machinist_2.12	0.6.8
org.typelevel	macro-compat_2.12	1.1.1
org.typelevel	spire-macros_2.12	0.17.0-M1
org.typelevel	spire-platform_2.12	0.17.0-M1
org.typelevel	spire-util_2.12	0.17.0-M1
org.typelevel	spire_2.12	0.17.0-M1
org.xerial	sqlite-jdbc	3.8.11.2
org.xerial.snappy	snappy-java	1.1.7.5
org.yaml	snakeyaml	1.24
oro	oro	2.0.8
pl.edu.icm	JLargeArrays	1.5
software.amazon.ion	ion-java	1.0.2
stax	stax-api	1.0.1
xmlenc	xmlenc	0.52

Jaa

Databricks Runtime 7.0 (EoS)

New features

Improvements

Major library changes

Python packages

R packages

Java and Scala libraries

Behavior changes

Spark behavior changes

Other behavior changes

Deprecations and removals

Apache Spark

In this section:

Core, Spark SQL, Structured Streaming

Highlights

Performance enhancements

Extensibility enhancements

Connector enhancements

Feature enhancements

SQL compatibility enhancements

Monitoring and debugability enhancements

PySpark enhancements

Documentation and test coverage enhancements

Other notable changes

Behavior changes for Spark core, Spark SQL, and Structured Streaming

MLlib

Highlights

Behavior changes for MLlib

SparkR

Behavior changes for SparkR

Deprecations

Known issues

Maintenance updates

System environment

Installed Python libraries

Installed R libraries

Installed Java and Scala libraries (Scala 2.12 cluster version)

Palaute

Lisäresursseja