Identify job roles


Large data projects can be complex. The projects often involve hundreds of decisions. Multiple people are typically involved, and each person helps take the project from design to production.

Roles such as business stakeholders, business analysts, and business intelligence developers are well known and still valuable. As data processing techniques change with technology, new roles are starting to appear. These roles provide specialized skills to help streamline the data engineering process.

In particular, three roles are starting to become common in modern data projects:

  • Data engineer
  • Data scientist
  • Artificial intelligence (AI) engineer

Data engineer

Data engineers provision and set up data platform technologies that are on-premises and in the cloud. They manage and secure the flow of structured and unstructured data from multiple sources. The data platforms they use can include relational databases, nonrelational databases, data streams, and file stores. Data engineers also ensure that data services securely and seamlessly integrate with other data platform technologies or Azure AI services such as Azure Cognitive Search and the Bot Framework.

The Azure data engineer focuses on data-related tasks in Azure. Primary responsibilities include using services and tools to ingest, egress, and transform data from multiple sources. Azure data engineers collaborate with business stakeholders to identify and meet data requirements. They design and implement solutions. They also manage, monitor, and ensure the security and privacy of data to satisfy business needs.

The role of data engineer is different from the role of a database administrator. A data engineer's scope of work goes well beyond looking after a database and the server where it's hosted. Data engineers must also get, ingest, transform, validate, and clean up data to meet business requirements. This process is called data wrangling.

A data engineer adds tremendous value to both business intelligence and data science projects. Data wrangling can consume much time. When the data engineer wrangles data, projects move more quickly because data scientists can focus on their own areas of work.

Both database administrators and business intelligence professionals can easily transition to a data engineer role. They just need to learn the tools and technology that are used to process large amounts of data.

Data scientist

Data scientists perform advanced analytics to extract value from data. Their work can vary from descriptive analytics to predictive analytics. Descriptive analytics evaluate data through a process known as exploratory data analysis (EDA). Predictive analytics are used in machine learning to apply modeling techniques that can detect anomalies or patterns, which are an important part of forecast models.

Descriptive and predictive analytics are just one aspect of data scientists' work. Some data scientists might even work in the realms of deep learning, iteratively experimenting to solve a complex data problem by using customized algorithms.

Anecdotal evidence suggests that most of the work in a data science project is spent on data wrangling and feature engineering. Data scientists can speed up the experimentation process when data engineers use their skills to successfully wrangle data.

AI engineer

AI engineers work with Azure AI services such as Azure Cognitive Search and the Bot Framework. Azure AI services includes Vision, Text Analytics, and Language.

Rather than creating models, AI engineers apply the prebuilt capabilities of Azure AI services APIs. AI engineers embed these capabilities within a new or existing application or bot. AI engineers rely on the expertise of data engineers to store information that's generated from AI.

For example, an AI engineer might be working on a Computer Vision application that processes images. This AI engineer would ask a data engineer to provision an Azure Cosmos DB instance to store the metadata and tags that the Computer Vision application generates.

Role differences

The roles of the data engineer, AI engineer, and data scientist differ. Each role solves a different problem.

Data engineers primarily provision data stores. They make sure that massive amounts of data are securely and cost-effectively extracted, loaded, and transformed.

AI engineers add the intelligent capabilities of vision, voice, language, and knowledge to applications, by using the Azure AI services offerings that are available out of the box.

When a Azure AI services application reaches its capacity, AI engineers call on data scientists. Data scientists develop machine learning models and customize components for an AI engineer's application.

Each data-technology role is distinct, and each contributes an important part to digital transformation projects.