@schwarze Thanks for the details. We would recommend to raise a Azure support desk ticket from Help+Support blade from Azure portal for your resource. This will help you to share the details securely and work with an engineer who can provide more insights about the issue that if it can be replicated.
UserScriptFilledDisk - what disk?
I am using AZ Machine Learning and have been running python scripts on VMs to train mnist and output some summary statistics on the trained networks. It worked fine for the first few jobs, but when I submitted a few more, all of them failed with a USerScriptFilledDisk error:
"UserError: AzureMLCompute job failed. UserScriptFilledDisk: User script filled the disk. Consider using VM SKU with larger disk size. If the issue persists contact Azure Support."
I am using nodes with only 7GB disk space, but it still does not make sense to me that I should have exceeded that just with mounting mnist and writing less than 1MB of numpy arrays to './outputs/'. The problem does not seem to be specific to any one or few nodes on my cluster. I made a new cluster and tried running my scripts on it. It still throws the same error. So how can I find out what disk I have filled up and how do fix it and keep it from happening again?
Thanks in advance!
More details:
I created an Azure machine learning compute cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpu-main1")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 100)
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_DS1_V2")
if compute_name in ws.compute_targets:
compute_target = ws.compute_targets[compute_name]
else:
provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
min_nodes = compute_min_nodes,
max_nodes = compute_max_nodes)
compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
I added a data set
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='mnist_zip')
dataset.download(target_path='.', overwrite=True)
dataset = dataset.register(workspace=ws,
name='mnist_zip',
description='zip file with preprocesses mnist data set',
create_new_version=False)
I submitted jobs to the cluster
runs = [ 0 for _ in range(30)]
for i in range(30):
args = ['--dataset', dataset.as_mount(), '--id', i]
#also tried '.as_download()' - did not seem to make a difference
src = ScriptRunConfig(source_directory=script_folder,
script='script.py',
arguments=args,
compute_target=compute_target,
environment=env)
runs[i] = exp.submit(config=src)
Azure Machine Learning
2 answers
Sort by: Most helpful
-
-
SJ 1 Reputation point
2022-05-25T09:24:54.727+00:00 Is there any update regarding this question? I experience the same issue, I am trying to register the dataset and UserScriptFilledDisk error occurs without any specific reason during the operation