Based on the error messages provided, here’s a concise response:
The log suggests missing or incompatible GPU libraries (cuFFT, cuDNN, cuBLAS) required by TensorFlow. Ensure the following steps are taken:
Verify that all required GPU libraries are installed and compatible with CUDA 12.4 and TensorFlow 2.19.0. Follow the TensorFlow GPU Setup Guide(https://www.tensorflow.org/install/pip).
Rebuild the TensorFlow Docker image to avoid duplicate library registrations. Ensure only necessary libraries are linked.
Check the LD_LIBRARY_PATH environment variable to confirm it includes paths to the necessary GPU libraries (/usr/local/cuda/lib64).
For additional GPU troubleshooting on AKS, refer to the Azure AKS GPU Guide : https://learn.microsoft.com/en-us/azure/aks/gpu-cluster?tabs=add-azure-linux-gpu-node-pool
These steps should address the missing library issue and prevent duplicate registrations.
If you have any further queries, please let us know we are glad to help you.
If it was helpful, please click "Upvote" on this post to let us know.
Thank You.