Currently datasets don't have snapshot capabilities. However, you can develop a heuristic where you create a snapshot of your data via blob (i.e if they are using blob). With the new dataset API, you are able to version and track datasets. A version will refer to your data but won't create a point in time snapshot. Hence, we recommend that you format your data to be in folders, so that when new data is added, it creates a folder for it, then the version will refer to old data (old folder) plus the new data (new folder). Please check out this document on how to version and track Azure Machine Learning datasets for reproducibility.
Azure ML Dataset and Snapshot
keonabut
11
Reputation points Microsoft Employee
Hi experts,
My customer want to snapshot datasets for reproducibility. I found method "create_snapshot", but found that it is deprecated. Is there any alternative way for dataset snapshot ?
Thanks,
Keita