Thanks for reaching out to Microsoft Q&A.
To assess the performance of an Azure Kubernetes Service (AKS) cluster, several monitoring tools and techniques can be utilized. Here’s a structured approach to check if your AKS cluster is performing well:
Monitoring Tools and Techniques
Azure Monitor and Container Insights:
- Enable Container Insights: This feature provides detailed monitoring of container workloads. You can enable it during cluster creation or afterward through the Azure portal.
- Access Metrics: Navigate to your AKS cluster in the Azure portal, go to Monitoring, and select Insights. Here, you can view CPU and memory usage metrics for nodes and containers. Set appropriate time ranges to analyze trends over time.
Use kubectl Commands:
Check Node and Pod Performance: Use the following commands to get real-time metrics:
kubectl top nodes
kubectl top pods --all-namespaces
These commands will display CPU and memory usage for each node and pod, helping identify resource-intensive components.
Analyze Logs with Log Analytics
Log Analytics Workspace: Connect your AKS logs to a Log Analytics workspace. This allows you to run queries on logs for deeper insights into performance issues.
- Predefined Queries: Use built-in queries to assess node readiness, pod status, and other critical metrics.
Resource Health Monitoring
- Check Resource Health: In the Azure portal, use the Resource Health feature to monitor the overall health of your AKS resources. This tool provides status reports indicating whether your resources are available, degraded, or unavailable.
Alerts and Notifications
- Set Up Alerts: Configure alerts based on specific metrics (e.g., CPU usage thresholds). This proactive approach helps in identifying performance degradation before it impacts applications.
Key Performance Indicators (KPIs) to Monitor
CPU Usage: Regularly check for high CPU usage which can indicate saturation.
Memory Usage: Monitor memory utilization to avoid out-of-memory errors.
Pod Status: Ensure that pods are in a running state; investigate any that are pending or failed.
Node Availability: Check if nodes are in a NotReady state, which could affect application performance.
Best Practices:
Resource Requests and Limits: Set appropriate resource requests and limits for your containers to optimize performance and prevent resource contention.
Horizontal Pod Autoscaler (HPA): Implement HPA to automatically adjust the number of pods in response to demand, ensuring better resource utilization.
For further reading:
https://learn.microsoft.com/vi-vn/azure/aks/monitor-aks
https://learn.microsoft.com/en-us/azure/architecture/operator-guides/aks/aks-triage-cluster-health
https://learn.microsoft.com/en-us/azure/aks/aks-diagnostics
Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.