revoscalepy (Python package in SQL Server Machine Learning Services)
Applies to: SQL Server 2017 (14.x) and later
revoscalepy is a Python package from Microsoft that supports distributed computing, remote compute contexts, and high-performance data science algorithms. The package is included in SQL Server Machine Learning Services.
The package offers the following functionality:
- Local and remote compute contexts on systems having the same version of revoscalepy
- Data transformation and visualization functions
- Data science functions, scalable through distributed or parallel processing
- Improved performance, including use of the Intel math libraries
Data sources and compute contexts that you create in revoscalepy can also be used in machine learning algorithms. For an introduction to these algorithms, see microsoftml Python module in SQL Server.
Full reference documentation
The revoscalepy package is distributed in multiple Microsoft products, but usage is the same whether you get the package in SQL Server or another product. Because the functions are the same, documentation for individual revoscalepy functions is published to just one location under the Python reference. Should any product-specific behaviors exist, discrepancies will be noted in the function help page.
Versions and platforms
The revoscalepy module is based on Python 3.5 and available only when you install one of the following Microsoft products or downloads:
Full product release versions are Windows-only in SQL Server 2017. Both Windows and Linux are supported for revoscalepy in SQL Server 2019 and later.
Functions by category
This section lists the functions by category to give you an idea of how each one is used. You can also use the table of contents to find functions in alphabetical order.
1-Data source and compute
revoscalepy includes functions for creating data sources and setting the location, or compute context, of where computations are performed. Functions relevant to SQL Server scenarios are listed in the table below.
SQL Server and Python use different data types in some cases. For a list of mappings between SQL and Python data types, see Python-to-SQL data types.
|RxInSqlServer||Create a SQL Server compute context object to push computations to a remote instance. Several revoscalepy functions take compute context as an argument. For a context-switch example, see Create a model using revoscalepy.|
|RxSqlServerData||Create a data object based on a SQL Server query or table.|
|RxOdbcData||Create a data source based on an ODBC connection.|
|RxXdfData||Create a data source based on a local XDF file. XDF files are often used to offload in-memory data to disk. An XDF file can be useful when working with more data than can be transferred from the database in one batch, or more data than can fit in memory. For example, if you regularly move large amounts of data from a database to a local workstation, rather than query the database repeatedly for each R operation, you can use the XDF file as a kind of cache to save the data locally and then work with it in your R workspace.|
If you are new to the idea of data sources or compute contexts, we recommend that you start with the article Distributed computing.
2-Data manipulation (ETL)
|rx_import||Import data into a .xdf file or data frame.|
|rx_data_step||Transform data from an input data set to an output data set.|
3-Training and summarization
|rx_btrees||Fit stochastic gradient boosted decision trees|
|rx_dforest||Fit classification and regression decision forests|
|rx_dtree||Fit classification and regression trees|
|rx_lin_mod||Create a linear regression model|
|rx_logit||Create a logistic regression model|
|rx_summary||Produce univariate summaries of objects in revoscalepy.|
You should also review the functions in microsoftml for additional approaches.
|rx_predict||Generate predictions from a trained model and can be used for real-time scoring.|
|rx_predict_default||Compute predicted values and residuals using rx_lin_mod and rx_logit objects.|
|rx_predict_rx_dforest||Calculate predicted or fitted values for a data set from an rx_dforest or rx_btrees object.|
|rx_predict_rx_dtree||Calculate predicted or fitted values for a data set from an rx_dtree object.|
How to work with revoscalepy
Functions in revoscalepy are callable in Python code encapsulated in stored procedures. Most developers build revoscalepy solutions locally, and then migrate finished Python code to stored procedures as a deployment exercise.
When running locally, you typically run a Python script from the command line, or from a Python development environment, and specify a SQL Server compute context using one of the revoscalepy functions. You can use the remote compute context for the entire code, or for individual functions. For example, you might want to offload model training to the server to use the latest data and avoid data movement.
When you are ready to encapsulate Python script inside a stored procedure, sp_execute_external_script, we recommend rewriting the code as a single function that has clearly defined inputs and outputs.
Inputs and outputs must be pandas data frames. When this is done, you can call the stored procedure from any client that supports T-SQL, easily pass SQL queries as inputs, and save the results to SQL tables. For an example, see Learn in-database Python analytics for SQL developers.
Using revoscalepy with microsoftml
The Python functions for microsoftml are integrated with the compute contexts and data sources that are provided in revoscalepy. When calling functions from microsoftml, for example when defining and training a model, use the revoscalepy functions to execute the Python code either locally or in a SQL Server remote compute context.
The following example shows the syntax for importing modules in your Python code. You can then reference the individual functions you need.
from microsoftml.modules.logistic_regression.rx_logistic_regression import rx_logistic_regression from revoscalepy.functions.RxSummary import rx_summary from revoscalepy.etl.RxImport import rx_import_datasource