Additional Resources for Machine Learning Server and Microsoft R

Important

This content is being retired and may not be updated in the future. The support for Machine Learning Server will end on July 1, 2022. For more information, see What's happening to Machine Learning Server?

Use the links in this article for recommended resources about R and Python.

Blogs and resources

Support

Sample datasets

New to Python

Python is installed through a distribution of Anaconda, adding tools and packages that are in common use across the development community. Jupyter Notebooks, included in Anaconda, is added to your system when you install Machine Learning Server with Python support. We recommend using Jupyter Notebooks for beginners. Its interactive environment, with visualization support, is easy to use and a mainstay of Python training content.

To start Jupyter Notebooks, go to C:\Program Files\Microsoft\ML Server\PYTHON_SERVER\Scripts and run Jupyter-Notebook.exe. A browser window will open to localhost, with options for creating or opening new Python files. For help on getting started, see Jupyter Notebook Tips, Tricks, and Shortcuts.

New to R

If you are just getting started with R, we recommend the R Core Team manuals, which are part of every R distribution, including An Introduction to R, The R Language Definition, Writing R Extensions. You can access that on the CRAN website or in your R installation directory.

Beyond the standard R manuals, there are many books available to help you learn R, and to help you use R to do particular tasks. The rest of this article helps point you in the right direction.

For beginners

  • R for Dummies by Andrie de Vries and Joris Meys
    Excellent starting place if you are new to R, filled with examples and tips

  • R FAQ by Kurt Hornik

  • An Introduction to R by the R Development Core Team, 2008
    Based on “Notes on S-Plus” by Bill Venables and David Smith. Includes an extensive sample session in Appendix A

Intermediate R Users

  • O’Reilly’s R Cookbook by Paul Teetor
    A book filled with recipes to help you accomplish specific tasks.

  • The Essential R Reference by Mark Gardener
    A dictionary-like reference to more than 400-R commands, including cross-references and examples

For SAS or SPSS Users

  • R for SAS and SPSS Users by Robert Muenchen
    Good starting point for users of SAS or SPSS who are new to R

  • SAS and R: Data Management, Statistical Analysis, and Graphics by Ken Kleinman and Nicholas J. Horton

  • Analysis of Correlated Data with SAS and R by Mohamed M. Shoukri and Mohammad A. Chaudhary

Information on Data Analysis and Statistics

A good source of information on introductory data analysis and statistics is Peter Dalgaard’s Introductory Statistics with R. After a chapter on basic R operations, Dalgaard discusses probability and distributions, descriptive statistics and graphics, one- and two-sample tests, regression and correlation, ANOVA and Kruskal-Wallis, tabular data, power and computation of sample size, multiple regression, linear models, logistic regression, and survival.

For more advanced techniques, the obvious starting point is Modern Applied Statistics with S by Bill Venables and Brian Ripley. This book starts with four introductory chapters on R, then gets into statistics from univariate statistics (chapter 5) to optimization (chapter 16). Along the way, the authors touch on many widely used techniques, including linear models, generalized linear models, clustering, tree-based methods, survival analysis, and many others.

Rapidly becoming the book for aspiring data scientists is The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. This book covers a variety of statistical techniques important in big data analysis and machine learning, including various tree-based methods, support vector machines, graphical models, and more.

Linear models, generalized linear models, and other regression techniques are the subject of a number of texts, including Frank Harrell’s Regression Modeling Strategies, John Fox’s Applied Regression Analysis and Generalized Linear Models and his R-specific companion volume, An R and S-PLUS Companion to Applied Regression, and Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman and Jennifer Hill.

Other useful books that take you into more advanced statistics are R in Action by Robert I. Kabacoff, A Handbook of Statistical Analyses Using R by Brian Everitt and Torsten Hothorn, Data Analysis and Graphics Using R by John Maindonald and John Braun, and The R Book by Michael Crawley.

If you are interested in an overview of the multiple uses of big data analytics, the book Big Data, Big Analytics by Michael Minelli, Michele Chambers, and Ambiga Dhiraj gives you an excellent start in understanding what big data is and how it is used in real-world business applications.

Information on Programming with R

The newest book from John M. Chambers, Software for Data Analysis: Programming with R, gives a thorough description of programming in R, including tips on debugging, writing packages, creating classes and methods, and interfacing to code in other languages. It also includes a useful chapter describing how R works.

R in a Nutshell by Joseph Adler is unlike most books on R in that it deals with R first and foremost as a programming language; it does touch on statistical topics, but that is not its main focus.

The book S Programming by Venables and Ripley is a concise, readable guide to programming in the S family of languages. Most of their advice remains valid, but the book was published when R was still at a pre-release version (0.90.1), some details have changed over time.

The Blue Book, White Book, and Green Book all have one or more chapters devoted to programming in S, with different points of emphasis. The Blue Book focuses on basic function writing. The White Book describes the S Version 3 class system and how to define classes, generic functions, and methods in that system. The Green Book describes the S Version 4 class system and how to define classes and methods in that system.

The manual Writing R Extensions by the R Core Team describes how to write complete R packages, including documentation.

Information on Getting Data Into and Out of R

The manual R Data Import/Export by the R Core Team describes how to read data into R from a variety of sources using both built-in R tools and additional packages. The book Data Manipulation with R by Phil Spector includes information on reading and writing data, and also further manipulation within R. Also, be sure to look at the RevoScaleR User’s Guide for information on data import and export capabilities provided by RevoScaleR.

Information on Creating Graphics with R

All of the references mentioned up to now contain at least some material on graphics, because graphical exploration is a primary motivation for using R in the first place. The Blue Book, in particular, describes in detail the “traditional S graphics" framework.

A popular graphics package that is rapidly growing its own complete package ecosystem is Hadley Wickham’s ggplot2 package, documented in Wickham’s ggplot2: Elegant Graphics for Data Analysis. The ggplot2 package implements in R many of the ideas from Leland Wilkinson’s The Grammar of Graphics.

The ggplot2 package is a high-level graphics package. For lower-level graphics functionality, the definitive reference is Paul Murrell’s R Graphics, which describes both the traditional S graphics framework (in particular, its implementation in R by Ross Ihaka) and the grid graphics framework developed by Murrell. It also describes the lattice system, developed by Deepayan Sarkar, that uses the grid framework to implement the Trellis graphics system developed by Rick Becker and Bill Cleveland. Serious users of the lattice system also consult Sarkar’s book, Lattice: Multivariate Data Visualization with R.

Trellis graphics are discussed thoroughly in Cleveland’s Visualizing Data. Cleveland’s earlier book, now its second edition, The Elements of Graphing Data remains essential reading for anyone interested in data visualization.

Interactive and Dynamic Graphics for Data Analysis by Dianne Cook and Deborah Swayne describes using R together with the GGobi visualization program for dynamic graphics.

Archived Product Documentation

R Productivity Environment (RPE) The RPE is an older development tool. RPE documentation can be found at the following links:

Revolution R Enterprise Docs Prior to Machine Learning Server and Microsoft R Server, the product was called Revolution R Enterprise (RRE).

Here is a list of the available archived documentation sets for RRE:

DeployR 8.x Docs

The documentation for these releases has been archived. See here.

More Books on R

Adler, J. (2010). R in a Nutshell. Sebastopol, CA: O'Reilly.

Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988). The New S Language: A Programming Environment for Data Analysis and Graphics. New York: Chapman and Hall.

Chambers, J. M. (1998). Programming with Data: A Guide to the S Language. New York: Springer.

Chambers, J. M. (2008). Software for Data Analysis: Programming with R. New York: Springer.

Chambers, J. M., & Hastie, T. J. (Eds.). (1992). Statistical Models in S. New York: Chapman and Hall.

Cleveland, W. S. (1993). Visualizing Data. Summit, New Jersey: Hobart Press.

Cleveland, W. S. (1994). The Elements of Graphing Data (second ed.). Summit, New Jersey: Hobart Press.

Cook, D., & Swayne, D. F. (2008). Interactive and Dynamic Graphics for Data Analysis: With R and GGobi. New York: Springer.

Crawley, M. J. (2013). The R Book (Second ed.). Chichester: John Wiley & Sons Ltd.

Dalgaard, P. (2002). Introductory Statistics with R. New York: Springer.

de Vries, A., & Meys, J. (2012). R for Dummies. Chichester: John Wiley & Sons.

Everitt, B. S., & Hothorn, T. (2006). A Handbook of Statistical Analyses Using R. Boca Raton, Florida: Chapman & Hall/CRC.

Fox, J. (2002). An R and S-PLUS Companion to Applied Regression. Thousand Oaks, CA: Sage.

Fox, J. (2008). Applied Regression Analysis and Generalized Linear Models. Thousand Oaks, CA: Sage.

Gardener, M. (2013). The Essential R Reference. Indianapolis, IN: John Wiley & Sons.

Gelman, A., & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. New York: Cambridge University Press.

Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv., 23(1), 5-48.

Harrell, F. E. (2001). Regression Model Strategies: with applications to linear models, logistic regression, and survival analysis. New York: Springer.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). New York: Springer.

Ihaka, R., & Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3), 299-314.

Kabacoff, R. I. (2011). R in Action. Shelter Island, NY: Manning.

Kleinman, K., & Horton, N. J. (2010). SAS and R: Data Management, Statistical Analysis, and Graphics. Boca Raton, FL: Chapman & Hall/CRC.

Maindonald, J., & Braun, J. (2007). Data Analysis and Graphics Using R: An Example-based Approach (second ed.). Cambridge: Cambridge University Press.

Matloff, N. (2011). The Art of R Programming. San Francisco: no starch press.

Minelli, M., Chambers, M., & Dhiraj, A. (2013). Big Data, Big Analytics. Hoboken, NJ: John Wiley & Sons.

Muenchen, R. A. (2009). R for SAS and SPSS Users. New York: Springer.

Murrell, P. (2006). R Graphics. Boca Raton, FL: Chapman & Hall/CRC.

R Development Core Team. (2008). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

R Development Core Team. (2008). An Introduction to R. Vienna: R Foundation for Statistical Computing.

Sarkar, D. (2008). Lattice: Multivariate Data Visualization with R. New York: Springer.

Shoukri, M. M., & Chaudhary, M. A. (2007). Analysis of Correlated Data with SAS and R (third ed.). Boca Raton, FL: Chapman & Hall/CRC.

Spector, P. (2008). Data Manipulation with R. New York: Springer.

Teetor, P. (2011). R Cookbook. Sebastopol, CA: O'Reilly.

Venables, W. N., & Ripley, B. D. (1999). S Programming. New York: Springer.

Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (Fourth Edition ed.). New York: Springer.

Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. New York: Springer.

Wilkinson, L. (2005). The Grammar of Graphics (second ed.). New York: Springer.

Find additional resources here.