Troubleshoot SQL Server Big Data Clusters Kubernetes

Applies to: SQL Server 2019 (15.x)

Important

The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.

This article describes several useful Kubernetes commands that you can use to monitor and troubleshoot a SQL Server 2019 Big Data Clusters. It shows how to view in-depth details of a pod or other Kubernetes artifacts that are located in the big data cluster. This article also covers common tasks, such as copying files to or from a container running one of the SQL Server big data cluster services.

Tip

For monitoring status of big data clusters components you can use azdata bdc status commands or the built-in troubleshooting notebooks in provided with Azure Data Studio.

Tip

Run the following kubectl commands on either a Windows (cmd or PS) or Linux (bash) client machine. They require previous authentication in the cluster and a cluster context to run against. For example, for a previously created AKS cluster you can run az aks get-credentials --name <aks_cluster_name> --resource-group <azure_resource_group_name> to download the Kubernetes cluster configuration file and set the cluster context.

Get status of pods

You can use the kubectl get pods command to get the status of pods in the cluster for either all namespaces or the big data cluster namespace. The following sections show examples of both.

Show status of all pods in the Kubernetes cluster

Run the following command to get all the pods and their statuses, including the pods that are part of the namespace that SQL Server big data cluster pods are created in:

kubectl get pods --all-namespaces

Show status of all pods in the SQL Server big data cluster

Use the -n parameter to specify a specific namespace. Note that SQL Server big data cluster pods are created in a new namespace created at cluster bootstrap time based on the cluster name specified in the deployment configuration file. The default name, mssql-cluster, is used here.

kubectl get pods -n mssql-cluster

You should see output similar to the following list for a running big data cluster:

PS C:\> kubectl get pods -n mssql-cluster
NAME              READY   STATUS    RESTARTS   AGE
appproxy-f2qqt    2/2     Running   0          110m
compute-0-0       3/3     Running   0          110m
control-zlncl     4/4     Running   0          118m
data-0-0          3/3     Running   0          110m
data-0-1          3/3     Running   0          110m
gateway-0         2/2     Running   0          109m
logsdb-0          1/1     Running   0          112m
logsui-jtdnv      1/1     Running   0          112m
master-0          7/7     Running   0          110m
metricsdb-0       1/1     Running   0          112m
metricsdc-shv2f   1/1     Running   0          112m
metricsui-9bcj7   1/1     Running   0          112m
mgmtproxy-x6gcs   2/2     Running   0          112m
nmnode-0-0        1/1     Running   0          110m
storage-0-0       7/7     Running   0          110m
storage-0-1       7/7     Running   0          110m

Note

During deployment, pods with a STATUS of ContainerCreating are still coming up. If the deployment hangs for any reason, this can give you an idea where the problem might be. Also look at the READY column. This tells you how many containers have started in the pod. Note that deployments can take 30 minutes or more depending on your configuration and network. Much of this time is spent downloading the container images for different components.

Get pod details

Run the following command to get a detailed description of a specific pod in JSON format output. It includes details, such as the current Kubernetes node that the pod is placed on, the containers running within the pod, and the image used to bootstrap the containers. It also shows other details, such as labels, status, and persisted volumes claims that are associated with the pod.

kubectl describe pod  <pod_name> -n <namespace_name>

The following example shows the details for the master-0 pod in a big data cluster named mssql-cluster:

kubectl describe pod  master-0 -n mssql-cluster

If any errors have occurred, you can sometimes see the error in the recent events for the pod.

Get pod logs

You can retrieve the logs for containers running in a pod. The following command retrieves the logs for all containers running in the pod named master-0 and outputs them to a file name master-0-pod-logs.txt:

kubectl logs master-0 --all-containers=true -n mssql-cluster > master-0-pod-logs.txt

Get status of services

Run the following command to get details for the big data cluster services. These details include their type and the IPs associated with respective services and ports. Note that SQL Server big data cluster services are created in a new namespace created at cluster bootstrap time based on the cluster name specified in deployment configuration file.

kubectl get svc -n <namespace_name>

The following example shows the status for services in a big data cluster named mssql-cluster:

kubectl get svc -n mssql-cluster

The following services support external connections to the big data cluster:

Service Description
master-svc-external Provides access to the master instance.
(EXTERNAL-IP,31433 and the SA user)
controller-svc-external Supports tools and clients that manage the cluster.
gateway-svc-external Provides access to the HDFS/Spark gateway.
(EXTERNAL-IP and the <AZDATA_USERNAME> user)1
appproxy-svc-external Support application deployment scenarios.

1 Beginning with SQL Server 2019 (15.x) CU 5, when you deploy a new cluster with basic authentication all endpoints including gateway use AZDATA_USERNAME and AZDATA_PASSWORD. Endpoints on clusters that are upgraded to CU 5 continue to use root as username to connect to gateway endpoint. This change does not apply to deployments using Active Directory authentication. See Credentials for accessing services through gateway endpoint in the release notes.

Tip

This is a way of viewing the services with kubectl, but it is also possible to use azdata bdc endpoint list command to view these endpoints. For more information, see Get big data cluster endpoints.

Get service details

Run this command to get a detailed description of a service in JSON format output. It will include details like labels, selector, IP, external-IP (if the service is of LoadBalancer type), port, etc.

kubectl describe service <service_name> -n <namespace_name>

The following example retrieves details for the master-svc-external service:

kubectl describe service master-svc-external -n mssql-cluster

Copy files

If you need to copy files from the container to your local machine, use kubectl cp command with the following syntax:

kubectl cp <pod_name>:<source_file_path> -c <container_name> -n <namespace_name> <target_local_file_path>

Similarly, you can use kubectl cp to copy files from the local machine into a specific container.

kubectl cp <source_local_file_path> <pod_name>:<target_container_path> -c <container_name>  -n <namespace_name>

Copy files from a container

The following example copies the SQL Server log files from the container to the ~/temp/sqlserverlogs path on the local machine (in this example the local machine is a Linux client):

kubectl cp master-0:/var/opt/mssql/log -c mssql-server -n mssql-cluster ~/tmp/sqlserverlogs

Copy files into container

The following example copies the AdventureWorks2022 file from the local machine to the SQL Server master instance container (mssql-server) in the master-0 pod. The file is copied to the /tmp directory in the container.

kubectl cp ~/Downloads/AdventureWorks2022.bak master-0:/tmp -c mssql-server -n mssql-cluster

Force delete a pod

It is not recommended to force-delete a pod. But for testing availability, resiliency, or data persistence, you can delete a pod to simulate a pod failure with the kubectl delete pods command.

kubectl delete pods <pod_name> -n <namespace_name> --grace-period=0 --force

The following example deletes the storage pool pod, storage-0-0:

kubectl delete pods storage-0-0 -n mssql-cluster --grace-period=0 --force

Get pod IP

For troubleshooting purposes, you might need to get the IP of the node a pod is currently running on. To get the IP address, use the kubectl get pods command with the following syntax:

kubectl get pods <pod_name> -o yaml -n <namespace_name> | grep hostIP

The following example gets the IP address of the node that the master-0 pod is running on:

kubectl get pods master-0 -o yaml -n mssql-cluster | grep hostIP

Kubernetes dashboard

You can launch the Kubernetes dashboard for additional information about the cluster. The following sections explain how to launch the dashboard for Kubernetes on AKS and for Kubernetes bootstrapped using kubeadm.

Start dashboard when cluster is running in AKS

To launch the Kubernetes dashboard run:

az aks browse --resource-group <azure_resource_group> --name <aks_cluster_name>

Note

If you get the following error: Unable to listen on port 8001: All listeners failed to create with the following errors: Unable to create listener: Error listen tcp4 127.0.0.1:8001: >bind: Only one usage of each socket address (protocol/network address/port) is normally permitted. Unable to create listener: Error listen tcp6: address [[::1]]:8001: missing port in >address error: Unable to listen on any of the requested ports: [{8001 9090}], make sure you did not start the dashboard already from another window.

When you launch the dashboard on your browser, you might get permission warnings due to RBAC being enabled by default in AKS clusters, and the service account used by the dashboard does not have enough permissions to access all resources (for example, pods is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list pods in the namespace "default"). Run the following command to give the necessary permissions to kubernetes-dashboard, and then restart the dashboard:

kubectl create clusterrolebinding kubernetes-dashboard -n kube-system --clusterrole=cluster-admin --serviceaccount=kube-system:kubernetes-dashboard

Start dashboard when Kubernetes cluster is bootstrapped using kubeadm

For detailed instructions how to deploy and configure the dashboard in your Kubernetes cluster, see the Kubernetes documentation. To launch the Kubernetes dashboard, run this command:

kubectl proxy

Next steps

For more information about big data clusters, see What are SQL Server Big Data Clusters.