Microsoft Fabric provides built-in R support for Apache Spark. This includes support for SparkR and sparklyr, which allows users to interact with Spark using familiar Spark or R interfaces. You can analyze data using R through Spark batch job definitions or with interactive Microsoft Fabric notebooks.
This document provides an overview of developing Spark applications in Synapse using the R language.
Use the experience switcher on the left side of your home page to switch to the Synapse Data Science experience.
Create and run notebook sessions
Microsoft Fabric notebook is a web interface for you to create files that contain live code, visualizations, and narrative text. Notebooks are a good place to validate ideas and use quick experiments to get insights from your data. Notebooks are also widely used in data preparation, data visualization, machine learning, and other big data scenarios.
To get started with R in Microsoft Fabric notebooks, change the primary language at the top of your notebook by setting the language option to SparkR (R).
In addition, you can use multiple languages in one notebook by specifying the language magic command at the beginning of a cell.
%%sparkr
# Enter your R code here
To learn more about notebooks within Microsoft Fabric Analytics, see How to use notebooks.
Install packages
Libraries provide reusable code that you might want to include in your programs or projects. To make third party or locally built code available to your applications, you can install a library onto one of your workspace or notebook session.
Microsoft Spark Utilities (MSSparkUtils) is a built-in package to help you easily perform common tasks. You can use MSSparkUtils to work with file systems, to get environment variables, to chain notebooks together, and to work with secrets. MSSparkUtils is supported for R notebooks.
To get started, you can run the following commands:
SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. SparkR also supports distributed machine learning using MLlib.
You can learn more about how to use SparkR by visiting How to use SparkR.
Use sparklyr
sparklyr is an R interface to Apache Spark. It provides a mechanism to interact with Spark using familiar R interfaces. You can use sparklyr through Spark batch job definitions or with interactive Microsoft Fabric notebooks.
Tidyverse is a collection of R packages that data scientists commonly use in everyday data analyses. It includes packages for data import (readr), data visualization (ggplot2), data manipulation (dplyr, tidyr), functional programming (purrr), and model building (tidymodels) etc. The packages in tidyverse are designed to work together seamlessly and follow a consistent set of design principles. Microsoft Fabric distributes the latest stable version of tidyverse with every runtime release.
The R ecosystem offers multiple graphing libraries that come packed with many different features. By default, every Spark instance in Microsoft Fabric contains a set of curated and popular open-source libraries. You can also add or manage extra libraries or versions by using the Microsoft Fabric library management capabilities.
Learn more about how to create R visualizations by visiting R visualization.
Apache Spark is a core technology for large-scale data analytics. Microsoft Fabric provides support for Spark clusters, enabling you to analyze and process data at scale.
As a Fabric analytics engineer associate, you should have subject matter expertise in designing, creating, and deploying enterprise-scale data analytics solutions.