Data Labeling project fails with “Dataset refresh has failed” despite valid PNGs and credentials
Description:
I'm stuck trying to create a Data Labeling project in Azure Machine Learning Studio, and I keep getting this error:
"The dataset refresh has failed. Verify the project's datastore credentials are correct and the dataset contains datapoints."
I've spent hours troubleshooting this. Here's what I've confirmed:
✅ My Blob Storage container has valid .png
images (no nested folders, standard size, visible in portal preview).
✅ I registered the dataset using a script, and ml_client.data.get()
confirms the path is correct.
✅ My workspace managed identity has these roles on the storage account: – Reader – Storage Blob Data Reader
✅ My own user identity also has those same roles.
✅ I enabled “Use workspace managed identity for data preview and profiling” during datastore credential setup.
✅ I'm using identity-based access, not shared keys.
✅ I even deleted and recreated the ML workspace, datastore, and dataset from scratch. Same issue.
- ✅ I tried flattening the folder structure (no recursion), renaming dataset, and resaving it with different paths. Still nothing. 🧠 I suspect the Data Labeling UI is failing to preview the dataset properly even though it’s 100% accessible from code and the images are valid. Maybe a regression? It was working before.