Deploy SQL Server Big Data Clusters on OpenShift on-premises and Azure Red Hat OpenShift
Applies to: SQL Server 2019 (15.x)
Important
The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.
This article explains how to deploy a SQL Server Big Data Cluster on OpenShift environments, on-premises or on Azure Red Hat OpenShift (ARO).
Tip
For a quick way to bootstrap a sample environment using ARO and then BDC deployed on this platform, you can use the Python script available here.
You can deploy big data clusters to on-premises OpenShift or on Azure Red Hat OpenShift (ARO). Validate the OpenShifts CRI-O version against tested configurations on the SQL Server Big Data Clusters release notes. While the deployment workflow is similar to deploying in other Kubernetes based platforms (kubeadm and AKS), there are some differences. The difference is mainly in relation to running applications as non-root user and the security context used for the namespace BDC is deployed in.
For deploying the OpenShift cluster on-premises see the Red Hat OpenShift documentation. For ARO deployments see the Azure Red Hat OpenShift.
This article outlines deployment steps that are specific to the OpenShift platform, points out options you have for accessing the target environment and the namespace you are using to deploy the big data cluster.
Pre-requisites
Important
Below pre-requisites must be performed by a OpenShift cluster admin (cluster-admin cluster role) that has sufficient permissions to create these cluster level objects. For more information on cluster roles in OpenShift see Using RBAC to define and apply permissions.
Ensure the
pidsLimit
setting on the OpenShift is updated to accommodate SQL Server workloads. The default value in OpenShift is too low for production like workloads. Start with at least4096
, but the optimal value depends themax worker threads
setting in SQL Server and the number of CPU processors on the OpenShift host node.- To find out how to update
pidsLimit
for your OpenShift cluster use these instructions. Note that OpenShift versions before4.3.5
had a defect causing the updated value to not take effect. Make sure you upgrade OpenShift to the latest version. - To help you compute the optimal value depending on your environment and planned SQL Server workloads, you can use the estimation and examples below:
Number of processors Default max worker threads Default workers per processor Minimum pidsLimit value 64 512 16 512 + (64 *16) = 1536 128 512 32 512 + (128*32) = 4608 Note
Other processes (e.g. backups, CLR, Fulltext, SQLAgent) also add some overhead, so add a buffer to the estimated value.
- To find out how to update
Download the custom security context constraint (SCC)
bdc-scc.yaml
:curl https://raw.githubusercontent.com/microsoft/sql-server-samples/master/samples/features/sql-big-data-cluster/deployment/openshift/bdc-scc.yaml -o bdc-scc.yaml
Apply the SCC to the cluster.
oc apply -f bdc-scc.yaml
Note
The custom SCC for BDC is based on the built-in
nonroot
SCC in OpenShift, with additional permissions. To learn more about security context constraints in OpenShift see Managing Security Context Constraints. For a detailed information on what additional permissions are required for big data clusters on top of thenonroot
SCC, download the whitepaper here.Create a namespace/project:
oc new-project <namespaceName>
Bind the custom SCC with the service accounts in the namespace where BDC is deployed:
oc create clusterrole bdc-role --verb=use --resource=scc --resource-name=bdc-scc -n <namespaceName> oc create rolebinding bdc-rbac --clusterrole=bdc-role --group=system:serviceaccounts:<namespaceName>
Assign appropriate permission to the user deploying BDC. Do one of the following.
If the user deploying BDC has cluster-admin role, proceed to deploy big data cluster.
If the user deploying BDC is a namespace admin, assign the user cluster-admin local role for the namespace created. This is the preferred option for the user deploying and managing the big data cluster to have namespace level admin permissions.
oc create rolebinding bdc-user-rbac --clusterrole=cluster-admin --user=<userName> -n <namespaceName>
The user deploying big data cluster must then log in to the OpenShift console:
oc login -u <deployingUser> -p <password>
Deploy big data cluster
Install latest azdata.
Clone one of the built-in configuration files for OpenShift, depending on your target environment (OpenShift on premises or ARO) and deployment scenario. See the OpenShift specific settings in the deployment configuration files section below for settings that are specific to OpenShift in the built-in configuration files. For more details on available configuration files see deployment guidance.
List all the available built-in configuration files.
azdata bdc config list
To clone one of the built-in configuration files, run below command (optionally, you can replace the profile based on your targeted platform/scenario):
azdata bdc config init --source openshift-dev-test --target custom-openshift
For a deployment on ARO, start with one of the
aro-
profiles, that includes default values forserviceType
andstorageClass
appropriate for this environment. For example:azdata bdc config init --source aro-dev-test --target custom-openshift
Customize the configuration files control.json and bdc.json. Here are some additional resources that guide you through the customizations for various use cases:
Note
Integration with Microsoft Entra ID (formerly Azure Active Directory) for BDC is not supported, hence you can't use this authentication method when deploying on ARO.
Deploy big data cluster
azdata bdc create --config custom-openshift --accept-eula yes
Upon successful deployment, you can log in and list the external cluster endpoints:
azdata login -n mssql-cluster azdata bdc endpoint list
OpenShift specific settings in the deployment configuration files
SQL Server 2019 CU5 introduces two feature switches to control the collection of pod and node metrics. These parameters are set to false
by default in the built-in profiles for OpenShift since the monitoring containers require privileged security context, which will relax some of the security constraints for the namespace BDC is deployed on.
"security": {
"allowNodeMetricsCollection": false,
"allowPodMetricsCollection": false
}
The name of the default storage class in ARO is managed-premium (as opposed to AKS where the default storage class is called default). You would find this in the control.json
corresponding to aro-dev-test
and aro-dev-test-ha
:
},
"storage": {
"data": {
"className": "managed-premium",
"accessMode": "ReadWriteOnce",
"size": "15Gi"
},
"logs": {
"className": "managed-premium",
"accessMode": "ReadWriteOnce",
"size": "10Gi"
}
bdc-scc.yaml
file
The SCC file for this deployment is:
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities:
- SETUID
- SETGID
- CHOWN
- SYS_PTRACE
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
type: RunAsAny
kind: SecurityContextConstraints
metadata:
annotations:
kubernetes.io/description: SQL Server BDC custom scc is based on 'nonroot' scc plus additional capabilities required by BDC.
generation: 2
name: bdc-scc
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
runAsUser:
type: MustRunAsNonRoot
seLinuxContext:
type: MustRunAs
supplementalGroups:
type: RunAsAny
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret
Next steps
Tutorial: Load sample data into a SQL Server big data cluster