Management operations in Azure Managed Instance for Apache Cassandra
Azure Managed Instance for Apache Cassandra provides automated deployment and scaling operations for managed open-source Apache Cassandra data centers. This article defines the management operations and features provided by the service. It also explains the separation of responsibilities between the Azure support team and customers when maintaining standalone and hybrid clusters.
Compaction
- There are different types of compaction. We currently perform a minor compaction via repair (see Maintenance). This performs a Merkle tree compaction, which is a special kind of compaction.
- Depending on the compaction strategy that was set on the table using CQL (for example
WITH compaction = { 'class' : 'LeveledCompactionStrategy' }
), Cassandra automatically compacts when the table reaches a specific size. We recommend that you carefully select a compaction strategy for your workload, and don't do any manual compactions outside the strategy.
Patching
Operating System-level patches are done automatically at approximately 2-week cadence.
Apache Cassandra software-level patches are done when security vulnerabilities are identified. The patching cadence may vary.
During patching, machines are rebooted one rack at a time. You shouldn't experience any degradation at the application side as long as quorum ALL setting is not being used, and the replication factor is 3 or higher.
The version in Apache Cassandra is in the format
X.Y.Z
. You can control the deployment of major (X) and minor (Y) versions manually via service tools. Whereas the Cassandra patches (Z) that may be required for that major/minor version combination are done automatically.
Note
The service currently supports Cassandra versions 3.11 and 4.0. By default, version 3.11 is deployed, as version 4.0 is currently in public preview. See our Azure CLI Quickstart (step 5) for specifying Cassandra version during cluster deployment.
Maintenance
The Nodetool repair is automatically run by the service using reaper. This tool is run once every week. You may wish to disable it if using your own service for a hybrid deployment.
Node health monitoring consists of:
- Actively monitoring each node's membership in the Cassandra ring.
- Auto-detecting, and auto-mitigating infrastructure issues like virtual machine, network, storage, Linux, and support software failures.
- Pro-actively monitoring CPU, disk, quorum loss, and other resource issues.
- Automatically bringing up failed nodes where possible, and manually bringing up nodes in response to auto-generated warnings.
Support
Azure Managed Instance for Apache Cassandra provides an SLA for the availability of data centers in a managed cluster. If you encounter any issues with using the service, file a support request in the Azure portal.
Our support benefits include:
- Single point of contact for Cassandra infrastructure issues - no need to raise support cases with IaaS teams (disk, compute, networking) separately.
- Pro-active advise via email on performance bottle necks, sizing, and other resource constraint issues.
- 24x7 support coverage, including auto-generated incidents for any severe outage issues.
- Community approved patch support (see Patching).
- In-house Java JDK/JVM engineering team support.
- Linux Operating System support with software supply chain security.
Important
We will investigate and diagnose any issues reported via support case, and resolve or mitigate where possible. However, you are ultimately responsible for any Apache Cassandra configuration level usage which causes CPU, disk, or network problems.
Examples of such issues include:
- Inefficient query operations.
- Throughput that exceeds capacity.
- Ingesting data that exceeds storage capacity.
- Incorrect keyspace configuration settings.
- Poor data model or partition key strategy.
In the event that we investigate a support case and discover that the root cause of the issue is at the Apache Cassandra configuration level (and not any underlying platform level aspects we maintain), we will still provide recommendations and guidance on remediation, or mitigation (when possible), before closing the case.
We recommend you enable metrics and/or become familiar with our Azure monitor integration in order to prevent common application/configuration level issues in Apache Cassandra, such as the above.
Warning
Azure Managed Instance for Apache Cassandra also let's you run nodetool
and sstable
commands for routine DBA administration - see article here. Some of these commands can destabilize the cassandra cluster and should only be run carefully and after being tested in non-production environments. Where possible, a --dry-run
option should be deployed first. Microsoft cannot offer any SLA or support on issues with running commands which alter the default database configuration and/or tables.
Backup and restore
Snapshot backups are enabled by default and taken every 24 hours. Backups are stored in an internal Azure Blob Storage account and are retained for up to 2 days (48 hours). There's no cost for the initial 2 backups. Additional backups will be charged, see pricing. To change the backup interval or retention period, or to restore from an existing backup, file a support request in the Azure portal.
Warning
Backups can be restored to the same VNet/subnet as your existing cluster, but they cannot be restored to the same cluster. Backups can only be restored to new clusters. Backups are intended for accidental deletion scenarios, and are not geo-redundant. They are therefore not recommended for use as a disaster recovery (DR) strategy in case of a total regional outage. To safeguard against region-wide outages, we recommend a multi-region deployment. Take a look at our quickstart for multi-region deployments.
Security
Azure Managed Instance for Apache Cassandra provides many built-in explicit security controls and features:
- Hardened Linux Virtual Machine images with a controlled supply chain.
- Common Vulnerability & Exposure (CVE) monitoring at the Operating System level.
- Certificate rotation for both Apache Cassandra and Prometheus software hosted on the managed Virtual Machines.
- Active vulnerability scanning.
- Active virus scanning.
- Secure coding practices.
For more information on security features, see our article here.
Hybrid support
When a hybrid cluster is configured, automated reaper operations running in the service will benefit the whole cluster. This includes data centers that aren't provisioned by the service. Outside this, it is your responsibility to maintain your on-premises or externally hosted data center.
Next steps
Get started with one of our quickstarts:
Feedback
Submit and view feedback for