Missing logfiles in resumed job
Hello,
I am running a DL job in Azure ML Studio using low priority nodes. In order to resume the training after an interruption due to preempted nodes, I have adapted my code in such a way that it automatically continues with the next epoch as soon as a new node becomes available. For a first test run, this resulted in multiple directories in the "Outputs + logs" section of my job, where each directory contains the "std_log.txt" logfile of a single retry run (See the attached screenshot).
However, I have started another job based on the exact same implementation and there only a single logfile of the last run is shown. Azure ML Studio seems to somehow overwrite the previous "std_log.txt" file and doesn't create new directories with separate logfiles for each retry. What could possibly cause this behavior and how can I ensure that always all logfiles are saved properly?
Best regards!