Team Data Science Process for data scientists

This article provides guidance for objectives that you set when you implement comprehensive data science solutions with Azure technologies. You're guided through:

  • Understanding an analytics workload.
  • Using the Team Data Science Process.
  • Using Azure Machine Learning.
  • Understanding the foundations of data transfer and storage.
  • Providing data source documentation.
  • Using tools for analytics processing.

These training materials are related to the Team Data Science Process (TDSP) and Microsoft open-source software and toolkits, which are helpful for envisioning, executing, and delivering data science solutions.

Lesson path

You can use the items in the following table to guide your own self-study. Read the Description to follow the path, select the Topic to see study references, and check your skills by using the Knowledge check.

Objective Topic Description Knowledge check
Understand the processes for developing analytic projects An introduction to the Team Data Science Process We begin by covering an overview of the TDSP. This process guides you through each step of an analytics project. Read through each of these sections to learn more about the process and how you can implement it. Review and download the TDSP project structure artifacts to your local machine for your project.
Agile development The TDSP works well with many different programming methodologies. In this Learning Path, we use Agile software development. Read through the "What is Agile Development?" and "Building Agile Culture" articles, which cover the basics of working with Agile. There are also other references at this site where you can learn more. Explain Continuous Integration and Continuous Delivery to a colleague.
DevOps for data science Developer operations (DevOps) involves people, processes, and platforms you can use to work through a project and integrate your solution into an organization's standard IT. This integration is essential for adoption, safety, and security. In this online course, you learn about DevOps practices and understand some of the toolchain options you have. Prepare a 30-minute presentation to a technical audience on how DevOps is essential for analytics projects.
Understand the technologies for data storage and processing Microsoft business analytics and AI We focus on a few technologies in this Learning Path that you can use to create an analytics solution, but Microsoft has many more. To understand the options you have, it's important to review the platforms and features available in Microsoft Azure, the Azure Stack, and on-premises options. Review this resource to learn the various tools you have available to answer analytics question. Download and review the presentation materials from this workshop.
Set up and configure your training, development, and production environments Microsoft Azure Now let's create an account in Microsoft Azure for training and learn how to create development and test environments. These free training resources get you started. Complete the Beginner and Intermediate paths. If you don't have an Azure account, create one. Sign in to the Azure portal and create one resource group for training.
The Azure command-line interface (CLI) There are multiple ways to work with Azure, from graphical tools like Visual Studio Code and Visual Studio, to web interfaces such as the Azure portal, and from the command line, such as Azure PowerShell commands and functions. In this article, we cover the CLI, which you can use locally on your workstation, in Windows and other operating systems, and in the Azure portal. Set your default subscription with the Azure CLI.
Azure Storage You need a place to store your data. In this article, you learn about Azure storage options, how to create a storage account, and how to copy or move data to the cloud. Read through this introduction to learn more. Create a Storage account in your training resource group, create a container for a blob object, and upload and download data.
Microsoft Entra ID Microsoft Entra ID forms the basis of securing your application. In this article, you learn more about accounts, rights, and permissions. Active Directory and security are complex topics, so read through this resource to understand the fundamentals. Add one user to Microsoft Entra ID. NOTE: You might not have permissions for this action if you aren't the administrator for the subscription. If that's the case, review this tutorial to learn more.
The Azure Data Science Virtual Machine for PyTorch You can install the tools for working with data science locally on multiple operating systems. But the Data Science Virtual Machine for PyTorch contains all of the tools you need and plenty of project samples to work with. In this article, you learn more about the Data Science Virtual Machine for PyTorch and how to work through its examples. This resource explains the Data Science Virtual Machine for PyTorch, how you can create one, and a few options for developing code with it. It also contains all the software you need to complete this learning path, so make sure you complete the knowledge path for this topic. Create a Data Science Virtual Machine for PyTorch and work through at least one lab.
Install and understand the tools and technologies for working with data science solutions Working with Git To follow our DevOps process with the TDSP, we need to have a version-control system. Machine Learning uses Git, a popular open-source distributed repository system. In this article, you learn more about how to install, configure, and work with Git and a central repository, GitHub. Clone this GitHub project for your learning path project structure.
Visual Studio Code Visual Studio Code is a cross-platform integrated development environment (IDE) that you can use with multiple languages and Azure tools. You can use this single environment to create your entire solution. Watch these introductory videos to get started. Install Visual Studio Code, and work through the Visual Studio Code features in the interactive editor playground.
Programming with Python In this solution, we use Python, one of the most popular languages in data science. This article covers the basics of writing analytic code with Python, and resources to learn more. Work through sections 1-9 of this reference, then check your knowledge. Add one entity to an Azure table using Python.
Working with Jupyter Notebook Notebooks are a way of introducing text and code in the same document. Machine Learning works with notebooks, so it's beneficial to understand how to use them. Read through this tutorial and give it a try in the knowledge check section. Open the Jupyter webpage, and select Welcome to Python.ipynb. Work through the examples on that page.
Machine learning Creating advanced analytic solutions involves working with data by using machine learning, which also forms the basis of working with AI and deep learning. This course teaches you more about machine learning. For a comprehensive course on data science, see this certification. Locate a resource on machine learning algorithms. (Hint: Search "azure machine learning algorithm cheat sheet")
scikit-learn The scikit-learn set of tools allows you to perform data science tasks in Python. We use this framework in our solution. This article covers the basics and explains where you can learn more. Using the Iris dataset, persist an SVM model using Pickle.
Working with Docker Docker is a distributed platform used to build, ship, and run applications, and is used frequently in machine learning. This article covers the basics of this technology and explains where you can go to learn more. Open Visual Studio Code, and install the Docker extension. Create a simple Node Docker container.
Azure HDInsight HDInsight is a Hadoop open-source infrastructure, available as a service in Azure. Your machine learning algorithms might involve large sets of data, and you can use HDInsight to store, transfer, and process large-scale data. This article covers working with HDInsight. Create a small HDInsight cluster. Use HiveQL statements to project columns onto an /example/data/sample.log file. Alternatively, you can complete this knowledge check on your local system.
Create a data processing flow from business requirements Determining the question following the TDSP With the development environment installed and configured, and the understanding of the technologies and processes in place, it's time to put everything together using the TDSP to perform an analysis. We need to start by defining the question, selecting the data sources, and the rest of the steps in the TDSP. Keep in mind the DevOps process as we work through this process. In this article, you learn how to take the requirements from your organization and create a data flow map through your application to define your solution using the TDSP. Locate a resource on "The 5 data science questions", and describe one question your organization might have in these areas. Which algorithms should you focus on for that question?
Use Machine Learning to create a predictive solution Machine Learning Machine Learning uses AI for data wrangling and feature engineering, manages experiments, and tracks model runs. It uses a single environment, and most functions can run locally or in Azure. You can use the PyTorch framework, the TensorFlow framework, or other frameworks to create your experiments. In this article, we focus on a complete example of this process, using everything you learned so far.
Use Power BI to visualize results Power BI Power BI is a data visualization tool. It's available on multiple platforms, like web devices, mobile devices, and desktop computers. In this article, you learn how to work with the output of the solution you created by accessing the results from Azure Storage and creating visualizations using Power BI. Complete this tutorial on Power BI. Then connect Power BI to the blob CSV created in an experiment run.
Monitor your solution Application Insights There are multiple tools you can use to monitor your end solution. Application Insights makes it easy to integrate built-in monitoring into your solution. Set up Application Insights to monitor an application.
Azure Monitor Logs Another method to monitor your application is to integrate it into your DevOps process. Azure Monitor Logs provides a rich set of features to help you monitor your analytic solutions after you deploy them. Complete this tutorial on using Azure Monitor Logs.
Complete this learning path Congratulations! You completed this learning path.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal author:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps

Continue your AI journey in the AI learning hub.