Databricks Runtime 14.1 per Machine Learning

Databricks Runtime 14.1 per Machine Learning offre un ambiente pronto per l'apprendimento automatico e l'analisi scientifica dei dati basato su Databricks Runtime 14.1. Databricks Runtime ML contiene molte librerie di Machine Learning più diffuse, tra cui TensorFlow, PyTorch e XGBoost. Databricks Runtime ML include AutoML, uno strumento per eseguire automaticamente il training delle pipeline di Machine Learning. Databricks Runtime ML supporta anche il training di Deep Learning distribuito usando Horovod.

Miglioramenti e nuove funzionalità

Databricks Runtime 14.1 ML è basato su Databricks Runtime 14.1. Per informazioni sulle novità di Databricks Runtime 14.1, tra cui Apache Spark MLlib e SparkR, vedere le note sulla versione di Databricks Runtime 14.1 .

Miglioramenti a Databricks AutoML

I notebook generati da Databricks AutoML vengono ora salvati come artefatti MLflow.

Miglioramenti all'archivio funzionalità di Databricks

È ora possibile dedurre e registrare automaticamente un esempio di input quando si registra un modello. A tale scopo, impostare su infer_model_exampleTrue quando si chiama log_model. L'esempio si basa sui dati di training specificati nel training_set parametro .

Per altre informazioni sull'archivio funzionalità di Databricks, vedere Che cos'è un archivio funzionalità?.

Ambiente di sistema

L'ambiente di sistema in Databricks Runtime 14.1 ML differisce da Databricks Runtime 14.1 come indicato di seguito:

Databricks Runtime 14.1 ML include XGBoost 1.7.6, che non supporta cluster GPU con funzionalità di calcolo 5.2 e versioni successive.

Librerie

Le sezioni seguenti elencano le librerie incluse in Databricks Runtime 14.1 ML che differiscono da quelle incluse in Databricks Runtime 14.1.

Contenuto della sezione:

Librerie di livello superiore

Databricks Runtime 14.1 ML include le librerie di livello superiore seguenti:

Librerie Python

Databricks Runtime 14.1 ML usa Virtualenv per la gestione dei pacchetti Python e include molti pacchetti di Machine Learning più diffusi.

Oltre ai pacchetti specificati nelle sezioni seguenti, Databricks Runtime 14.1 ML include anche i pacchetti seguenti:

  • hyperopt 0.2.7+db4
  • 3.0.0_db1 sparkdl
  • automl 1.22.0

Per riprodurre l'ambiente Python di Databricks Runtime ML nell'ambiente virtuale Python locale, scaricare il file di requirements-14.1.txt ed eseguire pip install -r requirements-14.1.txt. Questo comando installa tutte le librerie open source usate da Databricks Runtime ML, ma non installa librerie sviluppate da Databricks, ad esempio databricks-automl, databricks-feature-storeo il fork di Databricks di hyperopt.

Librerie Python nei cluster CPU

Libreria Versione Libreria Versione Libreria Versione
absl-py 1.0.0 accelerate 0.21.0 aiohttp 3.8.5
aiosignal 1.3.1 anyio 3.5.0 appdirs 1.4.4
argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 astor 0.8.1
asttoken 2.0.5 astunparse 1.6.3 async-timeout 4.0.3
attrs 22.1.0 audioread 3.0.0 azure-core 1.29.1
azure-cosmos 4.3.1 azure-storage-blob 12.18.1 azure-storage-file-datalake 12.13.1
backcall 0.2.0 bcrypt 3.2.0 beautifulsoup4 4.11.1
black 22.6.0 bleach 4.1.0 blinker 1.4
blis 0.7.10 boto3 1.24.28 botocore 1.27.96
cachetools 5.3.1 Catalogo 2.0.9 codificatori di categoria 2.6.2
certifi 2022.12.7 cffi 1.15.1 chardet 4.0.0
charset-normalizer 2.0.4 Clic 8.0.4 cloudpickle 2.0.0
cmdstanpy 1.1.0 serv 0.1.2 Confezione 0.1.3
configparser 5.2.0 contourpy 1.0.5 convertdate 2.4.0
Crittografia 39.0.1 cycler 0.11.0 cymem 2.0.8
Cython 0.29.32 dacite 1.8.1 databricks-automl-runtime 0.2.19
databricks-cli 0.17.7 databricks-feature-store 0.15.1 databricks-sdk 0.1.6
dataclasses-json 0.5.14 datasets 2.14.4 dbl-tempo 0.1.23
dbus-python 1.2.18 debugpy 1.6.7 decorator 5.1.1
deepspeed 0.10.0 defusedxml 0.7.1 dill 0.3.6
diskcache 5.6.3 distlib 0.3.7 docstring-to-markdown 0,11
entrypoints 0,4 ephem 4.1.4 evaluate 0.4.0
executing 0.8.3 facet-overview 1.1.1 fastapi 0.98.0
fastjsonschema 2.18.0 fasttext 0.9.2 filelock 3.9.0
Flask 2.2.5 flatbuffers 23.5.26 fonttools 4.25.0
frozenlist 1.4.0 fsspec 2022.11.0 future 0.18.3
gast 0.4.0 Libreria di runtime GCC 1.10.0 gitdb 4.0.10
GitPython 3.1.27 google-api-core 2.11.1 google-auth 2.21.0
google-auth-oauthlib 1.0.0 google-cloud-core 2.3.3 google-cloud-storage 2.10.0
google-crc32c 1.5.0 google-pasta 0.2.0 google-resumable-media 2.6.0
googleapis-common-protos 1.60.0 greenlet 2.0.1 grpcio 1.48.2
grpcio-status 1.48.1 gunicorn 20.1.0 gviz-api 1.10.0
h11 0.14.0 h5py 3.7.0 hjson 3.1.0
festività 0,30 horovod 0.28.1 htmlmin 0.1.12
httplib2 0.20.2 httptools 0.6.0 huggingface-hub 0.14.1
idna 3.4 ImageHash 4.3.1 sbilanciato-learn 0.10.1
importlib-metadata 4.11.3 importlib-resources 6.0.1 ipykernel 6.25.0
ipython 8.14.0 ipython-genutils 0.2.0 ipywidgets 7.7.2
isodate 0.6.1 itsdangerous 2.0.1 jedi 0.18.1
jeepney 0.7.1 Jinja2 3.1.2 jmespath 0.10.0
joblib 1.2.0 joblibspark 0.5.1 jsonschema 4.17.3
jupyter-client 7.3.4 jupyter-server 1.23.4 jupyter_core 5.2.0
jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 keras 2.13.1
Portachiavi 23.5.0 kiwisolver 1.4.4 langchain 0.0.267
langcodes 3.3.0 langsmith 0.0.38 launchpadlib 1.10.16
lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lazy_loader 0,3
libclang 15.0.6.1 librosa 0.10.1 lightgbm 4.0.0
llvmlite 0.39.1 LunarCalendar 0.0.9 lxml 4.9.1
Mako 1.2.0 Markdown 3.4.1 MarkupSafe 2.1.1
Marshmallow 3.20.1 matplotlib 3.7.0 matplotlib-inline 0.1.6
Mccabe 0.7.0 mistune 0.8.4 mlflow-skinny 2.7.1
more-itertools 8.10.0 mpmath 1.2.1 msgpack 1.0.5
multidict 6.0.4 multimethod 1.9.1 multiprocesso 0.70.14
mormurhash 1.0.10 mypy-extensions 0.4.3 nbclassic 0.5.2
nbclient 0.5.13 nbconvert 6.5.4 nbformat 5.7.0
nest-asyncio 1.5.6 networkx 2.8.4 ninja 1.11.1
nltk 3,7 nodeenv 1.8.0 notebook 6.5.2
notebook_shim 0.2.2 numba 0.56.4 numexpr 2.8.4
numpy 1.23.5 oauthlib 3.2.0 openai 0.27.8
openapi-schema-pydantic 1.2.4 opt-einsum 3.3.0 creazione del pacchetto 22.0
pandas 1.5.3 pandocfilters 1.5.0 paramiko 2.9.2
parso 0.8.3 pathspec 0.10.3 patia 0.10.2
patsy 0.5.3 petastorm 0.12.1 pexpect 4.8.0
phik 0.12.3 pickleshare 0.7.5 Pillow 9.4.0
pip 22.3.1 platformdirs 2.5.2 plotly 5.9.0
pluggy 1.0.0 pmdarima 2.0.3 pooch 1.4.0
preshed 3.0.9 prometheus-client 0.14.1 prompt-toolkit 3.0.36
Profeta 1.1.4 protobuf 4.24.0 psutil 5.9.0
psycopg2 2.9.3 ptyprocess 0.7.0 pure-eval 0.2.2
py-cpuinfo 9.0.0 pyarrow 8.0.0 pyasn1 0.4.8
pyasn1-modules 0.2.8 pybind11 2.11.1 pycparser 2.21
pydantic 1.10.6 pyflakes 3.0.1 Pygments 2.11.2
PyGObject 3.42.1 PyJWT 2.3.0 PyMeeus 0.5.12
PyNaCl 1.5.0 pyodbc 4.0.32 pyparsing 3.0.9
pyright 1.1.294 pirsistente 0.18.0 pytesseract 0.3.10
python-dateutil 2.8.2 python-dotenv 1.0.0 python-editor 1.0.4
python-lsp-jsonrpc 1.0.0 python-lsp-server 1.7.1 pytoolconfig 1.2.5
pytz 2022.7 PyWavelets 1.4.1 PyYAML 6.0
pyzmq 23.2.0 regex 2022.7.9 requests 2.28.1
requests-oauthlib 1.3.1 responses 0.18.0 Corda 1.7.0
rsa 4.9 s3transfer 0.6.2 safetensors 0.3.3
scikit-learn 1.1.1 seaborn 0.12.2 Segreto Archiviazione 3.3.1
Send2Trash 1.8.0 trasformatori di frase 2.2.2 frase 0.1.99
setuptools 65.6.3 shap 0.42.1 simplejson 3.17.6
sei 1.16.0 filtro dei dati 0.0.7 smart-open 5.2.1
smmap 5.0.0 sniffio 1.2.0 soundfile 0.12.1
soupsieve 2.3.2.post1 soxr 0.3.6 Spacy 3.6.1
spacy-legacy 3.0.12 spacy-logger 1.0.5 spark-tensorflow-distributor 1.0.0
SQLAlchemy 1.4.39 sqlparse 0.4.2 srsly 2.4.7
ssh-import-id 5,11 stack-data 0.2.0 starlette 0.27.0
statsmodels 0.13.5 sympy 1.11.1 tabulate 0.8.10
tangled-up-in-unicode 0.2.0 tenacity 8.1.0 tensorboard 2.13.0
tensorboard-data-server 0.7.1 tensorboard-plugin-profile 2.13.1 tensorflow-cpu 2.13.0
tensorflow-estimator 2.13.0 tensorflow-io-gcs-filesystem 0.34.0 termcolor 2.3.0
terminado 0.17.1 thinc 8.1.12 threadpoolctl 2.2.0
tiktoken 0.4.0 tinycss2 1.2.1 tokenize-rt 4.2.1
tokenizer 0.13.3 tomli 2.0.1 Torcia 2.0.1+CPU
torchvision 0.15.2+CPU tornado 6.1 tqdm 4.64.1
traitlets 5.7.1 Trasformatori 4.31.0 typeguard 2.13.3
Typer 0.9.0 typing-inspect 0.9.0 typing_extensions 4.4.0
ujson 5.4.0 aggiornamenti automatici 0,1 urllib3 1.26.14
uvicorn 0.23.2 uvloop 0.17.0 virtualenv 20.16.7
Visioni 0.7.5 wadllib 1.3.6 Wasabi 1.1.2
watchfiles 0.20.0 wcwidth 0.2.5 webencodings 0.5.1
websocket-client 0.58.0 Websockets 11.0.3 Werkzeug 2.2.2
whatthepatch 1.0.2 wheel 0.38.4 widgetsnbextension 3.6.1
wordcloud 1.9.2 wrapt 1.14.1 xgboost 1.7.6
xxhash 3.3.0 yapf 0.31.0 yarl 1.9.2
Profilatura dei dati y 4.2.0 zipp 3.11.0

Librerie Python nei cluster GPU

Libreria Versione Libreria Versione Libreria Versione
absl-py 1.0.0 accelerate 0.21.0 aiohttp 3.8.5
aiosignal 1.3.1 anyio 3.5.0 appdirs 1.4.4
argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 astor 0.8.1
asttoken 2.0.5 astunparse 1.6.3 async-timeout 4.0.3
attrs 22.1.0 audioread 3.0.0 azure-core 1.29.1
azure-cosmos 4.3.1 azure-storage-blob 12.18.1 azure-storage-file-datalake 12.13.1
backcall 0.2.0 bcrypt 3.2.0 beautifulsoup4 4.11.1
black 22.6.0 bleach 4.1.0 blinker 1.4
blis 0.7.10 boto3 1.24.28 botocore 1.27.96
cachetools 5.3.1 Catalogo 2.0.9 codificatori di categoria 2.6.2
certifi 2022.12.7 cffi 1.15.1 chardet 4.0.0
charset-normalizer 2.0.4 Clic 8.0.4 cloudpickle 2.0.0
Cmake 3.27.5 cmdstanpy 1.1.0 serv 0.1.2
Confezione 0.1.3 configparser 5.2.0 contourpy 1.0.5
convertdate 2.4.0 Crittografia 39.0.1 cycler 0.11.0
cymem 2.0.8 Cython 0.29.32 dacite 1.8.1
databricks-automl-runtime 0.2.19 databricks-cli 0.17.7 databricks-feature-store 0.15.1
databricks-sdk 0.1.6 dataclasses-json 0.5.14 datasets 2.14.4
dbl-tempo 0.1.23 dbus-python 1.2.18 debugpy 1.6.7
decorator 5.1.1 deepspeed 0.10.0 defusedxml 0.7.1
dill 0.3.6 diskcache 5.6.3 distlib 0.3.7
docstring-to-markdown 0,11 einops 0.6.1 entrypoints 0,4
ephem 4.1.4 evaluate 0.4.0 executing 0.8.3
facet-overview 1.1.1 fastapi 0.98.0 fastjsonschema 2.18.0
fasttext 0.9.2 filelock 3.9.0 flash-attn 2.0.8
Flask 2.2.5 flatbuffers 23.5.26 fonttools 4.25.0
frozenlist 1.4.0 fsspec 2022.11.0 future 0.18.3
gast 0.4.0 Libreria di runtime GCC 1.10.0 gitdb 4.0.10
GitPython 3.1.27 google-api-core 2.11.1 google-auth 2.21.0
google-auth-oauthlib 1.0.0 google-cloud-core 2.3.3 google-cloud-storage 2.10.0
google-crc32c 1.5.0 google-pasta 0.2.0 google-resumable-media 2.6.0
googleapis-common-protos 1.60.0 greenlet 2.0.1 grpcio 1.48.2
grpcio-status 1.48.1 gunicorn 20.1.0 gviz-api 1.10.0
h11 0.14.0 h5py 3.7.0 hjson 3.1.0
festività 0,30 horovod 0.28.1 htmlmin 0.1.12
httplib2 0.20.2 httptools 0.6.0 huggingface-hub 0.14.1
idna 3.4 ImageHash 4.3.1 sbilanciato-learn 0.10.1
importlib-metadata 4.11.3 importlib-resources 6.0.1 ipykernel 6.25.0
ipython 8.14.0 ipython-genutils 0.2.0 ipywidgets 7.7.2
isodate 0.6.1 itsdangerous 2.0.1 jedi 0.18.1
jeepney 0.7.1 Jinja2 3.1.2 jmespath 0.10.0
joblib 1.2.0 joblibspark 0.5.1 jsonschema 4.17.3
jupyter-client 7.3.4 jupyter-server 1.23.4 jupyter_core 5.2.0
jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 keras 2.13.1
Portachiavi 23.5.0 kiwisolver 1.4.4 langchain 0.0.267
langcodes 3.3.0 langsmith 0.0.38 launchpadlib 1.10.16
lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lazy_loader 0,3
libclang 15.0.6.1 librosa 0.10.1 lightgbm 4.0.0
Illuminato 16.0.6 llvmlite 0.39.1 LunarCalendar 0.0.9
lxml 4.9.1 Mako 1.2.0 Markdown 3.4.1
MarkupSafe 2.1.1 Marshmallow 3.20.1 matplotlib 3.7.0
matplotlib-inline 0.1.6 Mccabe 0.7.0 mistune 0.8.4
mlflow-skinny 2.7.1 more-itertools 8.10.0 mpmath 1.2.1
msgpack 1.0.5 multidict 6.0.4 multimethod 1.9.1
multiprocesso 0.70.14 mormurhash 1.0.10 mypy-extensions 0.4.3
nbclassic 0.5.2 nbclient 0.5.13 nbconvert 6.5.4
nbformat 5.7.0 nest-asyncio 1.5.6 networkx 2.8.4
ninja 1.11.1 nltk 3,7 nodeenv 1.8.0
notebook 6.5.2 notebook_shim 0.2.2 numba 0.56.4
numexpr 2.8.4 numpy 1.23.5 oauthlib 3.2.0
openai 0.27.8 openapi-schema-pydantic 1.2.4 opt-einsum 3.3.0
creazione del pacchetto 22.0 pandas 1.5.3 pandocfilters 1.5.0
paramiko 2.9.2 parso 0.8.3 pathspec 0.10.3
patia 0.10.2 patsy 0.5.3 petastorm 0.12.1
pexpect 4.8.0 phik 0.12.3 pickleshare 0.7.5
Pillow 9.4.0 pip 22.3.1 platformdirs 2.5.2
plotly 5.9.0 pluggy 1.0.0 pmdarima 2.0.3
pooch 1.4.0 preshed 3.0.9 prompt-toolkit 3.0.36
Profeta 1.1.4 protobuf 4.24.0 psutil 5.9.0
psycopg2 2.9.3 ptyprocess 0.7.0 pure-eval 0.2.2
py-cpuinfo 9.0.0 pyarrow 8.0.0 pyasn1 0.4.8
pyasn1-modules 0.2.8 pybind11 2.11.1 pycparser 2.21
pydantic 1.10.6 pyflakes 3.0.1 Pygments 2.11.2
PyGObject 3.42.1 PyJWT 2.3.0 PyMeeus 0.5.12
PyNaCl 1.5.0 pyodbc 4.0.32 pyparsing 3.0.9
pyright 1.1.294 pirsistente 0.18.0 pytesseract 0.3.10
python-dateutil 2.8.2 python-dotenv 1.0.0 python-editor 1.0.4
python-lsp-jsonrpc 1.0.0 python-lsp-server 1.7.1 pytoolconfig 1.2.5
pytz 2022.7 PyWavelets 1.4.1 PyYAML 6.0
pyzmq 23.2.0 regex 2022.7.9 requests 2.28.1
requests-oauthlib 1.3.1 responses 0.18.0 Corda 1.7.0
rsa 4.9 s3transfer 0.6.2 safetensors 0.3.3
scikit-learn 1.1.1 seaborn 0.12.2 Segreto Archiviazione 3.3.1
Send2Trash 1.8.0 trasformatori di frase 2.2.2 frase 0.1.99
setuptools 65.6.3 shap 0.42.1 simplejson 3.17.6
sei 1.16.0 filtro dei dati 0.0.7 smart-open 5.2.1
smmap 5.0.0 sniffio 1.2.0 soundfile 0.12.1
soupsieve 2.3.2.post1 soxr 0.3.6 Spacy 3.6.1
spacy-legacy 3.0.12 spacy-logger 1.0.5 spark-tensorflow-distributor 1.0.0
SQLAlchemy 1.4.39 sqlparse 0.4.2 srsly 2.4.7
ssh-import-id 5,11 stack-data 0.2.0 starlette 0.27.0
statsmodels 0.13.5 sympy 1.11.1 tabulate 0.8.10
tangled-up-in-unicode 0.2.0 tenacity 8.1.0 tensorboard 2.13.0
tensorboard-data-server 0.7.1 tensorboard-plugin-profile 2.13.1 tensorflow 2.13.0
tensorflow-estimator 2.13.0 tensorflow-io-gcs-filesystem 0.34.0 termcolor 2.3.0
terminado 0.17.1 thinc 8.1.12 threadpoolctl 2.2.0
tiktoken 0.4.0 tinycss2 1.2.1 tokenize-rt 4.2.1
tokenizer 0.13.3 tomli 2.0.1 Torcia 2.0.1+cu118
torchvision 0.15.2+cu118 tornado 6.1 tqdm 4.64.1
traitlets 5.7.1 Trasformatori 4.31.0 Triton 2.0.0
typeguard 2.13.3 Typer 0.9.0 typing-inspect 0.9.0
typing_extensions 4.4.0 ujson 5.4.0 aggiornamenti automatici 0,1
urllib3 1.26.14 uvicorn 0.23.2 uvloop 0.17.0
virtualenv 20.16.7 Visioni 0.7.5 wadllib 1.3.6
Wasabi 1.1.2 watchfiles 0.20.0 wcwidth 0.2.5
webencodings 0.5.1 websocket-client 0.58.0 Websockets 11.0.3
Werkzeug 2.2.2 whatthepatch 1.0.2 wheel 0.38.4
widgetsnbextension 3.6.1 wordcloud 1.9.2 wrapt 1.14.1
xgboost 1.7.6 xxhash 3.3.0 yapf 0.31.0
yarl 1.9.2 Profilatura dei dati y 4.2.0 zipp 3.11.0

Librerie R

Le librerie R sono identiche alle librerie R in Databricks Runtime 14.1.

Librerie Java e Scala (cluster Scala 2.12)

Oltre alle librerie Java e Scala in Databricks Runtime 14.1, Databricks Runtime 14.1 ML contiene i file JAR seguenti:

Cluster CPU

ID gruppo ID artefatto Versione
com.typesafe.akka akka-actor_2.12 2.5.23
ml.dmlc xgboost4j-spark_2.12 1.7.3
ml.dmlc xgboost4j_2.12 1.7.3
org.graphframes graphframes_2.12 0.8.2-db2-spark3.4
org.mlflow mlflow-client 2.7.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0

Cluster GPU

ID gruppo ID artefatto Versione
com.typesafe.akka akka-actor_2.12 2.5.23
ml.dmlc xgboost4j-gpu_2.12 1.7.3
ml.dmlc xgboost4j-spark-gpu_2.12 1.7.3
org.graphframes graphframes_2.12 0.8.2-db2-spark3.4
org.mlflow mlflow-client 2.7.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0