Train models with scikit-learn in Microsoft Fabric
This article describes how to train and track the iterations of a scikit-learn model. Scikit-learn is a popular open-source machine learning framework frequently used for supervised and unsupervised learning. The framework provides tools for model fitting, data preprocessing, model selection, model evaluation, and more.
Prerequisites
Install scikit-learn within your notebook. You can install or upgrade the version of scikit-learn on your environment by using the following command:
pip install scikit-learn
Set up the machine learning experiment
You can create a machine learning experiment by using the MLFLow API. The MLflow set_experiment()
function creates a new machine learning experiment named sample-sklearn, if it doesn't already exist.
Run the following code in your notebook and create the experiment:
import mlflow
mlflow.set_experiment("sample-sklearn")
Train a scikit-learn model
After you set up the experiment, you create a sample dataset and a logistic regression model. The following code starts an MLflow run, and tracks the metrics, parameters, and final logistic regression model. After you generate the final model, you can save the resulting model for more tracking.
Run the following code in your notebook and create the sample dataset and logistic regression model:
import mlflow.sklearn
import numpy as np
from sklearn.linear_model import LogisticRegression
from mlflow.models.signature import infer_signature
with mlflow.start_run() as run:
lr = LogisticRegression()
X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
y = np.array([0, 0, 1, 1, 1, 0])
lr.fit(X, y)
score = lr.score(X, y)
signature = infer_signature(X, y)
print("log_metric.")
mlflow.log_metric("score", score)
print("log_params.")
mlflow.log_param("alpha", "alpha")
print("log_model.")
mlflow.sklearn.log_model(lr, "sklearn-model", signature=signature)
print("Model saved in run_id=%s" % run.info.run_id)
print("register_model.")
mlflow.register_model(
"runs:/{}/sklearn-model".format(run.info.run_id), "sample-sklearn"
)
print("All done")
Load and evaluate the model on a sample dataset
After you save the model, you can load it for inferencing.
Run the following code in your notebook and load the model, and then run the inference on a sample dataset:
# Inference with loading the logged model
from synapse.ml.predict import MLflowTransformer
spark.conf.set("spark.synapse.ml.predict.enabled", "true")
model = MLflowTransformer(
inputCols=["x"],
outputCol="prediction",
modelName="sample-sklearn",
modelVersion=1,
)
test_spark = spark.createDataFrame(
data=np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1).tolist(), schema=["x"]
)
batch_predictions = model.transform(test_spark)
batch_predictions.show()
Related content
- Explore machine learning models
- Create machine learning experiments