Resources deployed with SQL Server Big Data Clusters

Applies to: SQL Server 2019 (15.x)

Important

The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.

This article describes the resources a SQL Server Big Data Cluster deploys.

A big data cluster deploys pods based on the deployment profile. For details see Default configurations.

This article describes the pods deployed with aks-dev-test-ha profile and includes a Spark pool. Query Kubernetes to see the pods deployed in your cluster. The following example returns a list of pods under a specific namespace.

kubectl get pods -n <namespace>

Replace <namespace> with the name of your big data cluster.

For more information, see How to deploy SQL Server Big Data Clusters on Kubernetes.

The following diagram displays the components deployed in a Big Data Cluster:

big-data-cluster-diagram

For information about the architecture, see Introducing SQL Server Big Data Clusters.

Deployed pods

The following table lists pods deployed in a Big Data Cluster.

Name Area
control-<nnnn> Control
controldb-<#> Control
controlwd-<nnnn> Control
logsdb-<#> Control
logsui-<nnnn> Control
metricsdb-<#> Control
metricsdc-<nnnn> Control
metricsui-<nnnn> Control
mgmtproxy-<nnnn> Control
zookeeper-<#> Control
dns-<nnnn> Control
master-<#n> Master instance
operator-<nnnn> Master instance
compute-<#n>-<#m> Compute pool
data-<#>-<#> Data pool
storage-<#>-<#> Storage pool
nmnode-<#>-<#> Storage pool
sparkhead-<#> Storage pool
appproxy-<#m> Application pool
gateway-<#> Gateway service

Not all pods are included in every big data cluster. Deployments with high availability, or active directory integration include specific pods.

High availability specific pods:

  • operator-<nnnn>
  • zookeeper-<#>

Active directory specific pods:

  • dns-<nnnn>

The following sections describe the pods and list the containers in each pod.

Control

Control pods provide the control service.

Pod name Count Kubernetes controller type Containers
control-# 1 ReplicaSet - controller

- security-support

- fluentbit
controldb 1 StatefulSet - mssql-server

- fluentbit
controlwd 1 ReplicaSet - controlwatchdog
logsdb-# 1 StatefulSet - elasticsearch
logsui 1 ReplicaSet - kibana
metricsdb-# 1 StatefulSet - influxdb
metricsdc 1 per Kubernetes node. DaemonSet - telegraf
metricsui-nnnn 1 ReplicaSet - grafana
mgmtproxy-nnnn 1 ReplicaSet - service-proxy

- fluentbit
dns-nnnn 0 or 1 for Active Directory integration ReplicaSet - dns

- fluentbit

Master instance

master-<#n> is the SQL Server master instance.

  • Manages the data pool via DDL
  • Manipulates data in the data pool via DML
  • Off-loads analytic query execution to the data pool
Pod name Count Kubernetes controller type Containers
master-<#n> 1 or more for high availability. StatefulSet - mssql-server

- fluentbit

- collectd

- mssql-ha-supervisor *
operator* 0 or 1 for high availability ReplicaSet - mssql-ha-operator

* Only high availability deployments. The operator implements and registers the custom resource definition for SQL Server and the Availability Group resources. When the operator is deployed, it registers itself as a listener for notifications about SQL Server resources being deployed in the Kubernetes cluster. mssql-ha-supervisor supports the availability group.

Each master pod contains one instance of SQL Server. A high-availability deployment includes 3 pods. Each pod includes a SQL Server instance with databases in a SQL Server Always On Availability Group.

Include additional pods at deployment time, depending on your workload.

Compute pool

Compute pool provides a SQL Server instance for computation.

Pod name Count Kubernetes controller type Containers
compute-<#n>-<#m> 1 or more. StatefulSet - mssql-server

- fluentbit

- collectd
  • #n identifies the compute pool.
  • #m identifies the instance ID within the pool.

The compute pool SQL Server instances are stateless. They only require storage for tempdb.

Include additional pods at deployment time, depending on your workload.

Data pool

The data pool provides SQL Server instances for storage and compute.

Pod name Count Kubernetes controller type Containers
data-<#n>-<#m> 0 or more StatefulSet - mssql-server

- fluentbit

- collectd
  • #n identifies the data pool.
  • #m identifies the instance ID within the pool.

Include additional pods at deployment time, depending on workload.

Storage pool

Storage pool provides data ingestion through Spark, storage in HDFS, data access through HDFS and SQL Server endpoints.

Pod name Count Kubernetes controller type Containers
storage-0-# 1 or more. Include additional pods at deployment time, depending on workload. StatefulSet - hadoop

- mssql-server

- fluentbit

nmnode-0-# 1 or more for high availability StatefulSet - hadoop

- fluentbit
sparkehead-# 1 or more for high availability StatefulSet - hadoop-yarn-jobhistory

- hadoop-livy-sparkhistory

- hadoop-hivemetastore

-- fluentbit
zookeeper 0 or 3 for high availability. StatefulSet - zookeeper

- fluentbit

Application pool

The application pool is included in some of the test configuration profiles. The application pool hosts application service proxies that you define when you deploy your applications for Big Data Clusters.

appproxy is a web API that sits in front of the application pool applications. It authenticates users and then routes the requests through to the applications.

Pod name Kubernetes controller type Containers
appproxy ReplicaSet - app-service-proxy

- fluentbit

For more information, see Introducing Application Deployment on a Big Data Cluster.

Include additional pods at deployment time, depending on workload.

Gateway service

Gateway services provides the Knox gateway to Spark, HDFS, Yarn, Yarn UI, and Spark UI.

Pod name Kubernetes controller type Containers
gateway-<#> StatefulSet - knox

- fluentbit

Only one gateway is supported.

Open-source container references

For for specific open-source projects and versions, see Open-source software reference.

Next steps

To learn more about the SQL Server Big Data Clusters, see the following resources: