Enterprise-grade R package management made easy in SQL Server
As enterprise data scientists access SQL Server R Services to perform their in-database analytic work such as data mining and machine learning, the R language's full power cannot be materialized without custom yet secure R package installation. SQL Server has offered R package management functionality since 2016 (SQL R Services 9.0) which enables a team of data scientists to, for example, manage R packages on a per-database basis, keep R packages in sync with the database, share R packages and/or each have his/her own custom installation and automatically resolve R package dependencies during uninstallation. This also eliminates the need for the database administrator to understand R and its package dependencies and gives R packages enterprise-level authentication and security.
Currently the rx function APIs are only for use in a SQL Server compute context. They are a convenient wrapper of the suite of package management functions like
install.packages() in base R.
Custom and secure package management is governed by scopes
- Shared scope - this scope allows allowed users to install & uninstall packages in per database shared location which in turn can be used by other users of the database on SQL server
- Private scope - this scope allows allowed users to install & uninstall packages in per database per user private location which can only be used by the single user of the database on SQL server
and database roles
- rpkgs-users - allows users to use shared packages installed by users belong to rpkgs-shared role
- rpkgs-private - allows all permissions as rpkgs-users role and also allows users to install, remove and use private packages
- rpkgs-shared - allows all permissions as rpkgs-private role and also allows users to install, remove shared packages
- db_owner - allows all permissions as rpkgs-shared role and also allows users to install & remove shared and private packages for other users for management
By default, any user can use the packages in the shared scope. The beauty is that, each rpkgs-private role user can also install his/her own version of this package in his/her private scope. And when this happens, the shared-scope package will be masked, meaning that when this user calls
library(<package name>), only his/her installed private-scope version of this package will be loaded.
We will walk through a simple example to mimic a mock yet typical scenario where data scientists collaborate in a database.
First we create a database called RevoTestDB. Then we need to
Step 2 will create the additional database roles rpkgs-private, rpkgs-shared and rpkgs-users which are specific for package management as shown on the left. Then we run the T-SQL script below to create the following database users, each taking a specific database role, with their level of permissions in descending order:
|data scientist lead
|senior data scientist
|junior data scientist
The script also grants other roles and permissions needed for these users to use the database as required in the set up SQL Server R Services step.
To install the package 'scales' in shared scope for anyone to use, the senior data scientist can run the following script, and by default he/she installed the lastest version (0.4.1) of 'scales' from MRAN:
Then any user can use the package by either passing the package name to
transformPackages in an rx fucntion or loading the package in an
rxExec call. For example,
Now say the junior data scientist wants to install an older version of scales (0.4.0) for himself/herself's own use, he/she can, for example, download the .zip file to his local working directory and run the following script to install scales 0.4.0 from the downloaded .zip file by specifying
repos = NULL in
rxInstallPackages and verify that's actually the version that's loaded. Other users will not be affected by the junior data scientist's installation and will still be using scales 0.4.1.
The data scientist lead can install/uninstall any packages on behalf of any user. And when he/she uninstalls a package, the dependencies are automatically taken care of by specifying
dependencies = TRUE in
The output will be something like this:
To change a SQL Server instance's system package installation manually, refer to Install Additional R Packages on SQL Server.