AKS api server sometimes giving 404 500 503 504 via SLI requests in Grafana Dashboard

Sunil Bhimanapalli 0 Reputation points
2023-08-02T03:14:52.44+00:00

Hi there!

My AKS cluster control plane sometimes giving 404 500 503 504 via SLI requests in Grafana Dashboard. What could be the reason and where I can check the logs of control plane.

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,456 questions
{count} votes

2 answers

Sort by: Most helpful
  1. vipullag-MSFT 26,487 Reputation points Moderator
    2023-08-02T04:25:38.7166667+00:00

    Hello Sunil Bhimanapalli

    Welcome to Microsoft Q&A Platform, thanks for posting your query here.

    Potential reasons for these errors include:

    -The control plane may be temporarily overloaded or facing issues, causing it to be unavailable to handle requests.

    -Networking problems between the Grafana Dashboard and the control plane can lead to errors in SLI requests.

    -Insufficient resources in the control plane nodes, such as CPU, memory, or disk space, might cause these errors.

    -Incorrect configurations in the AKS cluster or related components could result in such errors.

    To investigate the cause of these errors, you can check the logs of the control plane components.

    Kube-apiserver: This is the Kubernetes API server that handles API requests to the control plane. You can access its logs through the Kubernetes control plane nodes.

    Kube-controller-manager: This component manages various control plane processes. Its logs can be accessed similarly to the kube-apiserver.

    Kube-scheduler: This component handles the scheduling of pods onto nodes in the cluster. Its logs are available on the control plane nodes.

    To access the logs of these components, you can use kubectl on your local machine or SSH into the control plane nodes directly and view the logs in the respective log directories. For example, to view the logs of the kube-apiserver, you can use:

    kubectl logs -n kube-system <kube-apiserver-pod-name>

    Inspecting these logs should provide more insights into the issues the control plane is facing, and it may help you diagnose the root cause of the 404, 500, 503, or 504 errors in the Grafana Dashboard.

    I used AI provided by ChatGPT to formulate this response. Hope this resolves your Query !!

    1 person found this answer helpful.
    0 comments No comments

  2. shiva patpi 13,366 Reputation points Microsoft Employee Moderator
    2023-08-02T05:48:34.61+00:00

    @Sunil Bhimanapalli ,

    There can be multiple reasons for those errors. To see the logs of the control plane you will have to enable the diagnostics.

    https://learn.microsoft.com/en-us/azure/aks/monitor-aks

    Try to enable kube-apiserver , kube-controller-manager categories.

    After you enable , please wait for couple of mins to generate the logs and then you can manipulate using kusto queries.

    Sample kusto queries: https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-log-query#resource-logs

    You will be looking for API server logs:

    AzureDiagnostics | where Category == "kube-apiserver"

    ////////////////

    Just see which tier API server you are using: https://learn.microsoft.com/en-us/azure/aks/free-standard-pricing-tiers

    Based upon the tier, the resources of your API server at the control plane side varies w.r.t CPU/Memory/Replicas for the API server.

    Regards,

    Shiva.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.