Atvik
Mar 31, 11 PM - Apr 2, 11 PM
Stærsti Fabric, Power BI og SQL námsviðburðurinn. 31. mars – 2. apríl. Notaðu kóðann FABINSIDER til að spara $400.
Skráðu þig í dagÞessi vafri er ekki lengur studdur.
Uppfærðu í Microsoft Edge til að nýta þér nýjustu eiginleika, öryggisuppfærslur og tæknilega aðstoð.
Azure Synapse Analytics offers various machine learning capabilities. This article provides an overview of how you can apply Machine Learning in the context of Azure Synapse.
This overview covers the different capabilities in Synapse related to machine learning, from a data science process perspective.
You might be familiar with how a typical data science process looks. It's a well-known process, which most machine learning projects follow.
At a high level, the process contains the following steps:
This article describes the Azure Synapse machine learning capabilities in different analytics engines, from a data science process perspective. For each step in the data science process, the Azure Synapse capabilities that can help are summarized.
Most machine learning projects involve well-established steps, and one of these steps is to access and understand the data.
Thanks to Azure Data Factory, a natively integrated part of Azure Synapse, there's a powerful set of tools available for data ingestion and data orchestration pipelines. This allows you to easily build data pipelines to access and transform the data into a format that can be consumed for machine learning. Learn more about data pipelines in Synapse.
An important part of the machine learning process is to understand the data by exploration and visualizations.
Depending on where the data is stored, Synapse offers a set of different tools to explore and prepare it for analytics and machine learning. One of the quickest ways to get started with data exploration is using Apache Spark or serverless SQL pools directly over data in the data lake.
Apache Spark for Azure Synapse offers capabilities to transform, prepare, and explore your data at scale. These spark pools offer tools like PySpark/Python, Scala, and .NET for data processing at scale. Using powerful visualization libraries, the data exploration experience can be enhanced to help understand the data better. Learn more about how to explore and visualize data in Synapse using Spark.
Serverless SQL pools offer a way to explore data using TSQL directly over the data lake. Serverless SQL pools also offer some built-in visualizations in Synapse Studio. Learn more about how to explore data with serverless SQL pools.
In Azure Synapse, training machine learning models can be performed on the Apache Spark Pools with tools like PySpark/Python, Scala, or .NET.
Machine learning models can be trained with help from various algorithms and libraries. Spark MLlib offers scalable machine learning algorithms that can help solving most classical machine learning problems. For a tutorial on how to train a model using MLlib in Synapse, see Build a machine learning app with Apache Spark MLlib and Azure Synapse Analytics.
In addition to MLlib, popular libraries such as Scikit Learn can also be used to develop models. See Manage libraries for Apache Spark in Azure Synapse Analytics for details on how to install libraries on Synapse Spark Pools.
Models that have been trained either in Azure Synapse or outside Azure Synapse can easily be used for batch scoring. Currently in Synapse, there are two ways in which you can run batch scoring.
You can use the TSQL PREDICT function in Synapse SQL pools to run your predictions right where your data lives. This powerful and scalable function allows you to enrich your data without moving any data out of your data warehouse. A new guided machine learning model experience in Synapse Studio was introduced where you can deploy an ONNX model from the Azure Machine Learning model registry in Synapse SQL Pools for batch scoring using PREDICT.
Another option for batch scoring machine learning models in Azure Synapse is to use the Apache Spark Pools for Azure Synapse. Depending on the libraries used to train the models, you can use a code experience to run your batch scoring.
SynapseML (previously known as MMLSpark), is an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines. It's an ecosystem of tools used to expand the Apache Spark framework in several new directions. SynapseML unifies several existing machine learning frameworks and new Microsoft algorithms into a single, scalable API that’s usable across Python, R, Scala, .NET, and Java. To learn more, see the key features of SynapseML.
Atvik
Mar 31, 11 PM - Apr 2, 11 PM
Stærsti Fabric, Power BI og SQL námsviðburðurinn. 31. mars – 2. apríl. Notaðu kóðann FABINSIDER til að spara $400.
Skráðu þig í dagÞjálfun
Námsslóð
Implement a machine learning solution with Azure Databricks DP-3014 - Training
Azure Databricks is a cloud-scale platform for data analytics and machine learning. Data scientists and machine learning engineers can use Azure Databricks to implement machine learning solutions at scale. (DP-3014)
Vottorð
Microsoft Certified: Azure Data Scientist Associate - Certifications
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.
Skjöl
SynapseML and its use in Azure Synapse Analytics. - Azure Synapse Analytics
Learn about the SynapseML library and how it simplifies the creation of massively scalable machine learning (ML) pipelines in Azure Synapse Analytics.
Tutorial: Machine learning model scoring wizard for dedicated SQL pools - Azure Synapse Analytics
Tutorial for how to use the machine learning model scoring wizard to enrich data in dedicated SQL pools.
Industry AI solutions - Azure Synapse Analytics
Industry AI solutions in Azure Synapse Analytics