Team Data Science Process for Developer Operations
This article explores the Developer Operations (DevOps) functions that are specific to an Advanced Analytics and Cognitive Services solution implementation. These training materials implement the Team Data Science Process (TDSP) and Microsoft and open-source software and toolkits, helpful for envisioning, executing and delivering data science solutions. It references topics that cover the DevOps Toolchain that is specific to Data Science and AI projects and solutions.
Lesson Path
The following table provides level-based guidance to help complete the DevOps objectives for implementing data science solutions on Azure.
Objective | Topic | Resource | Technologies | Level | Prerequisites |
---|---|---|---|---|---|
Understand Advanced Analytics | The Team Data Science Process Lifecycle | This technical walkthrough describes the Team Data Science Process | Data Science | Intermediate | General technology background, familiarity with data solutions, Familiarity with IT projects and solution implementation |
Understand the Microsoft Azure Platform for Advanced Analytics | Information Management | This reference gives and overview of Azure Data Factory to build pipelines for analytics data solutions | Microsoft Azure Data Factory | Experienced | General technology background, familiarity with data solutions, Familiarity with IT projects and solution implementation |
This reference covers an overview of the Azure Data Catalog which you can use to document and manage metadata on your data sources | Microsoft Azure Data Catalog | Intermediate | General technology background, familiarity with data solutions, familiarity with Relational Database Management Systems (RDBMS) and NoSQL data sources | ||
This reference covers an overview of the Azure Event Hubs system and how you and use it to ingest data into your solution | Azure Event Hubs | Intermediate | General technology background, familiarity with data solutions, familiarity with Relational Database Management Systems (RDBMS) and NoSQL data sources, familiarity with the Internet of Things (IoT) terminology and use | ||
Big Data Stores | This reference covers an overview of using the Azure Synapse Analytics to store and process large amounts of data | Azure Synapse Analytics | Experienced | General technology background, familiarity with data solutions, familiarity with Relational Database Management Systems (RDBMS) and NoSQL data sources, familiarity with HDFS terminology and use | |
This reference covers an overview of using Azure Data Lake to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics | Azure Data Lake Store | Intermediate | General technology background, familiarity with data solutions, familiarity with NoSQL data sources, familiarity with HDFS | ||
Machine learning and analytics | This reference covers an introduction to machine learning, predictive analytics, and Artificial Intelligence systems | Azure Machine Learning | Intermediate | General technology background, familiarity with data solutions, familiarity with Data Science terms, familiarity with Machine Learning and artificial intelligence terms | |
This article provides an introduction to Azure HDInsight, a cloud distribution of the Hadoop technology stack. It also covers what a Hadoop cluster is and when you would use it | Azure HDInsight | Intermediate | General technology background, familiarity with data solutions, familiarity with NoSQL data sources | ||
This reference covers an overview of the Azure Data Lake Analytics job service | Azure Data Lake Analytics | Intermediate | General technology background, familiarity with data solutions, familiarity with NoSQL data sources | ||
This overview covers using Azure Stream Analytics as a fully-managed event-processing engine to up real-time analytic computations on streaming data | Azure Stream Analytics | Intermediate | General technology background, familiarity with data solutions, familiarity with structured and unstructured data concepts | ||
Intelligence | This reference covers an overview of the available Cognitive Services (such as vision, text, and search) and how to get started using them | Cognitive Services | Experienced | General technology background, familiarity with data solutions, software development | |
This reference covers and introduction to the Microsoft Bot Framework and how to get started using it | Bot Framework | Experienced | General technology background, familiarity with data solutions | ||
Visualization | This self-paced, online course covers the Power BI system, and how to create and publish reports | Microsoft Power BI | Beginner | General technology background, familiarity with data solutions | |
Solutions | This resource page covers multiple applications you can review, test and implement to see a complete solution from start to finish | Microsoft Azure, Azure Machine Learning, Cognitive Services, Microsoft R, Azure Cognitive Search, Python, Azure Data Factory, Power BI, Azure Document DB, Application Insights, Azure SQL DB, Azure Synapse Analytics, Microsoft SQL Server, Azure Data Lake, Cognitive Services, Bot Framework, Azure Batch, | Intermediate | General technology background, familiarity with data solutions | |
Understand and Implement DevOps processes | What is DevOps? | This article explains the fundamentals of DevOps and helps explain how they map to DevOps practices | DevOps, Microsoft Azure Platform, Azure DevOps | Intermediate | Familiarity with Agile and other Development Frameworks, IT Operations Familiarity |
Use the DevOps Toolchain for Data Science | Configure | This reference covers the basics of choosing the proper visualization in Visio to communicate your project design | Visio | Intermediate | General technology background, familiarity with data solutions |
This reference describes the Azure Resource Manager, terms, and serves as the primary root source for samples, getting started, and other references | Azure Resource Manager, Azure PowerShell, Azure CLI | Intermediate | General technology background, familiarity with data solutions | ||
This reference explains the Azure Data Science Virtual Machines for Linux and Windows | Data Science Virtual Machine | Experienced | Familiarity with Data Science Workloads, Linux | ||
This walkthrough explains configuring Azure cloud service roles with Visual Studio - pay close attention to the connection strings specifically for storage accounts | Visual Studio | Intermediate | Software Development | ||
This series teaches you how to use Microsoft Project to schedule time, resources and goals for an Advanced Analytics project | Microsoft Project | Intermediate | Understand Project Management Fundamentals | ||
This Microsoft Project template provides a time, resources and goals tracking for an Advanced Analytics project | Microsoft Project | Intermediate | Understand Project Management Fundamentals | ||
This Azure Data Catalog tutorial describes a system of registration and discovery for enterprise data assets | Azure Data Catalog | Beginner | Familiarity with Data Sources and Structures | ||
This Microsoft Virtual Academy course explains how to set up Dev-Test with Visual Studio Codespace and Microsoft Azure | Visual Studio Codespace | Experienced | Software Development, familiarity with Dev/Test environments | ||
This Management Pack download for Microsoft System Center contains a Guidelines Document to assist in working with Azure assets | System Center | Intermediate | Experience with System Center for IT Management | ||
This document is intended for developer and operations teams to understand the benefits of PowerShell Desired State Configuration | PowerShell DSC | Intermediate | Experience with PowerShell coding, enterprise architectures, scripting | ||
Code | This download also contains documentation on using Visual Studio Codespace Code for creating Data Science and AI applications | Visual Studio Codespace | Intermediate | Software Development | |
This getting started site teaches you about DevOps and Visual Studio | Visual Studio | Beginner | Software Development | ||
You can write code directly from the Azure portal using the App Service Editor. Learn more at this resource about Continuous Integration with this tool | Azure portal | Highly Experienced | Data Science background - but read this anyway | ||
This resource explains how to get started with Azure Machine Learning | Azure Machine Learning | Intermediate | Software Development | ||
This reference contains a list and a study link to all of the development tools on the Data Science Virtual Machine in Azure | Data Science Virtual Machine | Experienced | Software Development, Data Science | ||
Read and understand each of the references in this Azure Security Trust Center for Security, Privacy, and Compliance - VERY important | Azure Security | Intermediate | System Architecture Experience, Security Development experience | ||
Build | This course teaches you about enabling DevOps Practices with Visual Studio Codespace Build | Visual Studio Codespace | Experienced | Software Development, Familiarity with an SDLC | |
This reference explains compiling and building using Visual Studio | Visual Studio | Intermediate | Software Development, Familiarity with an SDLC | ||
This reference explains how to orchestrate processes such as software builds with Runbooks | System Center | Experienced | Experience with System Center Orchestrator | ||
Test | Use this reference to understand how to use Visual Studio Codespace for Test Case Management | Visual Studio Codespace | Experienced | Software Development, Familiarity with an SDLC | |
Use this previous reference for Runbooks to automate tests using System Center | System Center | Experienced | Experience with System Center Orchestrator | ||
As part of not only testing but development, you should build in Security. The Microsoft SDL Threat Modeling Tool can help in all phases. Learn more and download it here | Threat Monitoring Tool | Experienced | Familiarity with security concepts, software development | ||
This article explains how to use the Microsoft Attack Surface Analyzer to test your Advanced Analytics solution | Attack Surface Analyzer | Experienced | Familiarity with security concepts, software development | ||
Package | This reference explains the concepts of working with Packages in TFS and Visual Studio Codespace | Visual Studio Codespace | Experienced | Software development, familiarity with an SDLC | |
Use this previous reference for Runbooks to automate packaging using System Center | System Center | Experienced | Experience with System Center Orchestrator | ||
This reference explains how to create a data pipeline for your solution, which you can save as a JSON template as a "package" | Azure Data Factory | Intermediate | General computing background, data project experience | ||
This topic describes the structure of an Azure Resource Manager template | Azure Resource Manager | Intermediate | Familiarity with the Microsoft Azure Platform | ||
DSC is a management platform in PowerShell that enables you to manage your IT and development infrastructure with configuration as code, saved as a package. This reference is an overview for that topic | PowerShell Desired State Configuration | Intermediate | PowerShell coding, familiarity with enterprise architectures, scripting | ||
Release | This head-reference article contains concepts for build, test, and release for CI/CD environments | Visual Studio Codespace | Experienced | Software development, familiarity with CI/CD environments, familiarity with an SDLC | |
Use this previous reference for Runbooks to automate release management using System Center | System Center | Experienced | Experience with System Center Orchestrator | ||
This article helps you determine the best option to deploy the files for your web app, mobile app backend, or API app to Azure App Service, and then guides you to appropriate resources with instructions specific to your preferred option | Microsoft Azure Deployment | Intermediate | Software development, experience with the Microsoft Azure platform | ||
Monitor | This reference explains Application Insights and how you can add it to your Advanced Analytics Solutions | Application Insights | Intermediate | Software Development, familiarity with the Microsoft Azure platform | |
This topic explains basic concepts about Operations Manager for the administrator who manages the Operations Manager infrastructure and the operator who monitors and supports the Advanced Analytics Solution | System Center | Experienced | Familiarity with enterprise monitoring, System Center Operations Manager | ||
This blog entry explains how to use the Azure Data Factory to monitor and manage the Advanced Analytics pipeline | Azure Data Factory | Intermediate | Familiarity with Azure Data Factory | ||
Understand how to use Open Source Tools with DevOps on Azure | Open Source DevOps Tools and Azure | This reference page contains two videos and a whitepaper on using Chef with Azure deployments | Chef | Experienced | Familiarity with the Azure Platform, Familiarity with DevOps |
This site has a toolchain selection path | DevOps, Microsoft Azure Platform, Azure DevOps, Open Source Software | Experienced | Used an SDLC, familiarity with Agile and other Development Frameworks, IT Operations Familiarity | ||
This tutorial automates the build and test phase of application development using a continuous integration and deployment CI/CD pipeline | Jenkins | Experienced | Familiarity with the Azure Platform, Familiarity with DevOps, Familiarity with Jenkins | ||
This contains an overview of working with Docker and Azure as well as additional references for implementation for Data Science applications | Docker | Intermediate | Familiarity with the Azure Platform, Familiarity with Server Operating Systems | ||
This installation and explanation explains how to use Visual Studio Code with Azure assets | VSCODE | Intermediate | Software Development, familiarity with the Microsoft Azure Platform | ||
This blog entry explains how to use R Studio with Microsoft R | R Studio | Intermediate | R Language experience | ||
This blog entry shows how to use continuous integration with Azure and GitHub | Git, GitHub | Intermediate | Software Development |
Contributors
This article is maintained by Microsoft. It was originally written by the following contributors.
Principal author:
- Mark Tabladillo | Senior Cloud Solution Architect
To see non-public LinkedIn profiles, sign in to LinkedIn.
Next steps
See Team Data Science Process for data scientists. This article provides guidance for implementing data science solutions with Azure.
Related resources
Feedback
Submit and view feedback for