Unsurprisingly, as a data scientist, your role primarily involves exploring and analyzing data. The results of an analysis might form the basis of a report or a machine learning model, but it all begins with data.
R is one of the most popular programming languages for data scientists. R is an elegant environment that's designed to support data science, and you can use it for many purposes.
After decades of open-source development, R provides extensive functionality that's backed by a massive set of powerful statistical modeling, machine learning, visualization and data wrangling packages. For example:
- Tidyverse is a collection of R packages that make data science faster, easier, and more fun.
- Tidymodels is a collection of R packages for modeling and statistical analysis.
- TensorFlow for R and Torch for R supply machine learning and deep learning capabilities.
Data analysis projects are usually designed to establish insights around a particular scenario or to test a hypothesis. For example, suppose you're a university professor who wants to collect data about your students' academic behavior and results, including the number of lectures attended, the hours spent studying, and the final grades achieved on the end of term exam.
You can analyze all the data to determine whether there's a relationship between how much they study and the final grades they earn. You might then use the data to test a hypothesis that only students who study for a minimum number of hours can expect to achieve a passing grade.
In this module, you'll learn how to use R to conduct such an analysis.
- Knowledge of basic mathematics
- Some experience programming in R
In this module, you'll learn:
- Common data exploration and analysis tasks.
- How to use R packages such as ggplot2, dplyr, and tidyr to turn raw data into understanding, insight, and knowledge.
Produced in partnership with Eric Wanjau - Microsoft Learn Student Ambassador and Researcher/Data Scientist: Leeds Institute for Data Analytics, University of Leeds