Visualize tensorboard logs for a job running on Azure ml

Ahsan Iqbal 0 Reputation points
2024-01-09T09:54:16.2066667+00:00

I have a training job running on azure ml. The job is submitted using azure cli. In the job.yaml I configured tensorboard service as specified here

services:

my_tensor_board:

type: tensor_board

log_dir: "outputs"

nodes: all

In order to get tensorboard link, I run following command

az ml job show-services --name my_job_name --resource-group my_resource_grp --workspace_name my_workspace_name

It returns a json response with link to tensorboard. However, if I follow the link no logs are shown.

Am I doing something wrong?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,340 questions
{count} votes

1 answer

Sort by: Most helpful
  1. dupammi 8,615 Reputation points Microsoft External Staff
    2024-01-10T06:59:27.2733333+00:00

    Hi @Ahsan Iqbal,

    Thank you for using the Microsoft Q&A forum.

    To debug the issue, I would suggest to first try executing the train.py script independently to ensure that it runs without any issues.

    Once the script is running successfully, you can use the same directory as the path to the TensorBoard logs destination path for saving. With this, you can make sure that the logs are getting generated and getting saved in the ML environment.

    When creating a job using the Azure Machine Learning CLI, the job is executed on a compute target that is specified in the compute field of the job.yml file. To check if the script is being accessed without any issues by the job compute, you can navigate to the jobs and check the job logs.

    By following these steps, you can isolate the issue and determine if the issue is with the job.yml file or the train.py script.

    I hope this helps! Thank you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.