Azure ML workspace blob structure / Can I safely delete these blobs?

ThierryL 146 Reputation points
2022-03-17T07:08:45.61+00:00

Hello,

I am trying to figure out the folder structure of Azure ML workspace in my storage account.
I want to be able to delete old pipeline runs and experiments that have piled up in my workspace directly from Azure Storage Explorer without breaking the system.
My datastores and folder structure are as follows:

Datastore: workspaceartifactstore
Blob container: azureml
Folder structure:
├─ ComputeRecord
├─ Dataset
├─ ExperimentRun
├─ LocalUpload

Datastore: workspaceblobstore (Default)
Blob container: azureml-blobstore-(a series of numbers)
Folder structure:
├─ azureml
│ ├── (a series of numbers)-setup
│ │ ├── _tracer.py
│ │ ├── azureml_globals.py
│ │ ├── context_managers.py
│ │ ├── job_prep.py
│ │ ├── log_history_status.py
│ │ ├── request_utilities.py
│ │ ├── run_token_provider.py
│ │ ├── utility_context_managers.py
│ ├── (another series of numbers)-setup
│ │ ├── sames files as above

It would help if I understood what does each of these containers actually store.
I already tried to delete all blobs stored in 'workspaceblobstore', but it didn't remove any pipeline or experiment from ML Studio.
I have a few datasets registered in my workspace, and I don't want to delete them (nor unregister them).

Can I set a data retention policy on both containers in order to delete old blobs?
Can I safely delete the blobs (folders) stored in 'workspaceartifactstore' too? Will they be recreated automatically when I run a new experiment?
Why are there two separate 'azureml' and 'azureml-blobstore-(a series of numbers)' containers? Is it possible to merge them?

Thanks.

Thank you.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,598 questions
0 comments No comments
{count} vote

Accepted answer
  1. GiftA-MSFT 11,151 Reputation points
    2022-03-17T14:52:56.467+00:00

    Hi, thanks for reaching out. I've worked on a similar inquiry and the advise is to not delete data stored in default datastore to avoid weird errors. The option to easily delete experiment runs is on the roadmap. Here's a similar thread. Feel free to raise and track feature request on ideas portal.

    According to documentation, when you create a workspace, an Azure blob container and an Azure file share are automatically registered as datastores to the workspace. They're named workspaceblobstore and workspacefilestore, respectively. The workspaceblobstore is used to store workspace artifacts and your machine learning experiment logs. It's also set as the default datastore and can't be deleted from the workspace. The workspacefilestore is used to store notebooks and R scripts authorized via compute instance.


0 additional answers

Sort by: Most helpful