Run Python and R scripts with Machine Learning Services on SQL Server 2019 Big Data Clusters

Applies to: SQL Server 2019 (15.x)

Important

The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. For more information, see Big data options on the Microsoft SQL Server platform.

You can run Python and R scripts on the master instance of SQL Server Big Data Clusters with Machine Learning Services.

Note

You can also run Java code on the master instance of SQL Server Big Data Clusters with the Java Language Extension. Following the steps below will also enable SQL Server Language Extensions.

Enable Machine Learning Services

Machine Learning Services is installed by default on SQL Server 2019 Big Data Clusters and does not require separate installation.

To enable Machine Learning Services, run this statement on the master instance:

EXEC sp_configure 'external scripts enabled', 1
RECONFIGURE WITH OVERRIDE
GO

You are now ready to run Python and R scripts on the master instance of Big Data Clusters. See the quickstarts under Next steps to run your first script.

Note

The configuration setting cannot be set on an availability group listener connection. If Big Data Clusters is deployed with high availability, the set external scripts enabled on each replica. See Enable on cluster with high availability.

Enable on cluster with high availability

When you Deploy SQL Server Big Data Cluster with high availability, the deployment creates an availability group for the master instance. To enable Machine Learning Services, set external scripts enabled on each instance of the availability group. For a Big Data Cluster, you need to run sp_configure on each replica of the SQL Server master instance

The following section describes how to enable external scripts on each instance.

Create an external load balancer for each instance

For each replica on the availability group, create a load balancer to allow you to connect to the instance.

kubectl expose pod <pod-name> --port=<connection port number> --name=<load-balancer-name> --type=LoadBalancer -n <kubernetes namespace>

The examples in this article use the following values:

  • <pod-name>: master-#
  • <connection port number>: 1533
  • <load-balancer-name>: mymaster-#
  • <kubernetes namespace>: mssql-cluster

Update the following script for your environment, and run the commands:

kubectl expose pod master-0 --port=1533 --name=mymaster-0 --type=LoadBalancer -n mssql-cluster
kubectl expose pod master-1 --port=1533 --name=mymaster-1 --type=LoadBalancer -n mssql-cluster
kubectl expose pod master-2 --port=1533 --name=mymaster-2 --type=LoadBalancer -n mssql-cluster

kubectl returns the following output.

service/mymaster-0 exposed
service/mymaster-1 exposed
service/mymaster-2 exposed

Each load balancer is a master replica endpoint.

Enable script execution on each replica

  1. Get the IP address for the master replica endpoint.

    The following command returns the external IP address for the replica endpoint.

    kubectl get services <load-balancer-name> -n <kubernetes namespace>

    To get the external IP address for each replica in this scenario, run the following commands:

    kubectl get services mymaster-0 -n mssql-cluster
    kubectl get services mymaster-1 -n mssql-cluster
    kubectl get services mymaster-2 -n mssql-cluster
    

    Note

    It may take a little time before the external IP address is available. Run the preceding script periodically until each endpoint returns an external IP address.

  2. Connect to the master replica endpoint and enable script execution.

    Run this statement:

    EXEC sp_configure 'external scripts enabled', 1
    RECONFIGURE WITH OVERRIDE
    GO
    

    For example, you can run the preceding command with sqlcmd. The following example connects to the master replica endpoint and enables script execution. Update the values in the script with for your environment.

    sqlcmd -S <IP address>,1533 -U <user name> -P <password> -Q "EXEC sp_configure 'external scripts enabled', 1; RECONFIGURE WITH OVERRIDE;"
    

    Repeat the step for each replica.

Demonstration

The following image demonstrates this process.

A screenshot of the command prompt providing a demo of the steps necessary to enable external scripts.

You are now ready to run Python and R scripts on the master instance of Big Data Clusters. See the quickstarts under Next steps to run your first script.

Delete the master replica endpoints

On the Kubernetes cluster, delete the endpoint for each replica. The endpoint is exposed in Kubernetes as a load-balancing service.

The following command deletes load-balancing service.

kubectl delete svc <load-balancer-name> -n mssql-cluster

For the examples in this article, run the following commands.

kubectl delete svc mymaster-0 -n mssql-cluster
kubectl delete svc mymaster-1 -n mssql-cluster
kubectl delete svc mymaster-2 -n mssql-cluster

SQL Server Big Data Clusters machine learning quickstarts

Python quickstarts

R quickstarts

SQL Server Big Data Clusters machine learning tutorials

Python tutorial

Ski rental (linear regression)

Categorize customers (k-means clustering)

NYC taxi tips (classification)

R tutorials

Ski rental (decision tree)

Categorize customers (k-means clustering)

NYC taxi tips (classification)

SQL Server Big Data Clusters machine learning how-to guides

Data exploration and modeling

Data type conversions

Deploy

Predictions

Package management

Install new Python packages

Install new R packages

Monitor

Security

Spark Machine Learning

Next steps