R is `GNU S' - A language and environment for statistical computing and graphics. R is similar to the award-winning S system, which was developed at Bell Laboratories by John Chambers et al. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).
R is designed as a true computer language with control-flow constructions for iteration and alternation, and it allows users to add additional functionality by defining new functions. For computationally intensive tasks, C, C++ and Fortran code can be linked and called at run time.
S is the statistician's Matlab and R is to S what Octave is to Matlab.
On HPC2N we have R available as a module.
To use the R module, first add it to your environment. Use:
module spider R
to see which versions are available and how to load the module and the needed prerequisites.
Example, loading R version 3.3.1
ml icc/2017.1.132-GCC-6.3.0-2.27 ml ifort/2017.1.132-GCC-6.3.0-2.27 ml impi/2017.1.132 ml R/3.3.1
NOTE: As in the example above, in order to load R built with an intel compiler based toolchain, you need to load all the modules mentioned on both the icc and the ifort lines (when doing ml spider R/3.3.1), not just one of them (error in the description from "ml spider"). Once the R modules are loaded, you can access R on the command line or you can access the Rstudio GUI by typing on the command line "rstudio". Please use Rstudio for light weight analysis or setting up scripts. For heavy tasks run the created scripts on the batch system (see below).
- Start R
- At the R prompt, enter these four lines:
ip <- as.data.frame(installed.packages()[,c(1,3:4)]) rownames(ip) <- NULL ip <- ip[is.na(ip$Priority),1:2,drop=FALSE] print(ip, row.names=FALSE)
- After the last enter, you will get a list of R packages and versions that
In order to run R from the batch system you need to pass a few extra parameters to R. This example submit file should get you started:
#!/bin/bash ### SNAC project number, enter your own #SBATCH -A SNICXXXX-YY-ZZ # Asking for one core #SBATCH -n 1 #SBATCH --time=00:10:00 # Serial job # No matter how many processers you request this job will run # on _only_ one core. # Load the module first. ml icc/2017.1.132-GCC-6.3.0-2.27 ml ifort/2017.1.132-GCC-6.3.0-2.27 ml impi/2017.1.132 ml R/3.3.1 # Run R in batch mode. Use input.R as input file and # store output in Rexample.out R --no-save --quiet < input.R > Rexample.out
There are several R packages that enable code parallelization, for instance Rmpi and doParallel among others. Rmpi is installed by default on the different R versions available in our systems. Regarding doParallel, you will need to install it (install.packages).
In order to use it, you need to load the library and use a submission script.
Note for both:
- you must NOT spawn slaves with mpi.spawn.Rslaves()!
- You must use "mpirun R" in your script.
Here is an example submit script (you need to load the Rmpi library in your R script)
#!/bin/bash #SBATCH -A SNICXXXX-YY-ZZZ # Asking for 8 cores - you can pick more or less, depending on what you need #SBATCH -n 8 # Asking for 30 min run time - change to fit what you need #SBATCH --time=00:30:00 # Load the R module ml icc/2017.1.132-GCC-6.3.0-2.27 ml ifort/2017.1.132-GCC-6.3.0-2.27 ml impi/2017.1.132 ml R/3.3.1 mpirun R -q -f <program>.R
The R script should contain the requested library and the cluster's initialization by establishing the cluster's size. After the code is executed, the cluster should be stopped. This is an example (we call it doParallel.R) of an R script which uses doParallel package:
library(doParallel) cl <- makeCluster(4) registerDoParallel(cl) #code executed in parallel stopCluster(cl)
The following batch script can be used for running this R script, notice that the cluster size above matches the number of requested cores in the batch script (-c 4):
#!/bin/bash #SBATCH -A SNICXXXX-YY-ZZZ #Asking for 10 min. #SBATCH -t 00:10:00 #SBATCH -N 1 #SBATCH -c 4 ml GCC/6.4.0-2.28 OpenMPI/2.1.2 ml R/3.4.4-X11-20180131 R -q --slave -f doParallel.R
Adding R modules
To add R modules not installed on our system see our documentaion on installing R modules in your own account.
Please see the R documentation for information on how to extend R by creating your own packages. Writing R Extensions is a comprehensive guide and is highly recommended.