Using the Team Data Science Process (TDSP) in Azure Machine Learning
This post is authored by Wei Guo, Data Scientist, Hai Ning, Principal PM Manager, Xibin Gao, Data Scientist, Brad Severtson, Senior Content Developer, and Debraj GuhaThakurta. Senior Data Scientist Lead, at Microsoft.
In this post, we describe how the Team Data Science Process (TDSP) project structure and documentation templates can be instantiated and used in Azure Machine Learning.
What is the TDSP?
TDSP is an agile, iterative, data science process for executing and delivering machine learning and advanced analytics solutions. It is designed to improve collaboration and efficiency in enterprise data science teams. TDSP has four components:
- A standard data science lifecycle definition.
- A standardized project structure, including project documentation and reporting templates.
- Infrastructure for project execution, e.g. compute and storage infrastructure, code repositories, etc.
- Tools for data science project tasks, e.g. collaborative version control and code review, data exploration and modeling, work planning, etc.
Why Use the TDSP in Azure ML?
Enterprise data science teams are generally quite diverse, comprising of individuals with varied backgrounds and training, and often situated across geographical boundaries. Standardizing on data science projects and project artifacts can, therefore, be a very important tool in improving collaboration, consistency and efficiency across such teams. That is precisely the value proposition of TDSP. To that end, we had previously released a GitHub repository for the TDSP project structure and templates, in support of the standard data science lifecycle. However, up until now, it has not been possible to instantiate the TDSP structure and templates within a data science tool. We have now addressed that limitation, enabling TDSP instantiation within Azure ML. This new capability provides the benefits of standardization to data science teams using Azure ML.
Using the TDSP Structure and Artifact Templates in Azure ML
While you can obtain detailed instructions on this topic as part of the Azure ML documentation, here is a short summary.
When creating a new project, the user can search for and select the TDSP template in the Azure ML Gallery (Figure 1 below). A new project will then be created in the user's work-space. A TDSP GitHub repository is used for the TDSP template selected in the Azure ML gallery.
Figure 1: Creating a new data science project in the Azure ML workspace by using the TDSP template. Entering
"TDSP" in the search box above gets you the TDSP template, which can then be used to create a new project.
The TDSP structure (Figure 2) can then be used to execute and deliver data science projects. The TDSP project structure should be populated with project-specific documentation, code and artifacts.
Figure 2: TDSP structure and documentation templates in a newly created project in Azure ML. The
TDSP project structure and templates are shown in the left panel of the figure (red box).
Worked-Out Azure ML Sample Projects
In addition to the Azure ML gallery template for creating a new project, we have provided worked-out Azure ML sample projects using the TDSP template.
Here is one sample, for US income classification using census data, that illustrates what an actual project that uses this template might look like.
In this post, we summarized how the TDSP template can be instantiated in Azure ML. Creating new projects with the TDSP template provides an easy way for Azure ML users to standardize on project structures and artifacts across their data science teams. This can help them realize the benefits of improved consistency and collaboration.
For next steps and how to get started, please refer to the resources below:
- How to create new projects in Azure ML using the TDSP template.
- TDSP template for Azure ML in the GitHub repository.
- General information about TDSP
Share your feedback and thoughts by posting a comment below.
Wei, Hai, Xibin, Brad & Debraj