R

Software name: 
R
Policy 

R is produced by the R Development Core Team, is freely available and open source, licensed under the GNU General Public Licence

General 

R is `GNU S' - A language and environment for statistical computing and graphics. R is similar to the award-winning S system, which was developed at Bell Laboratories by John Chambers et al. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).

Description 

R is designed as a true computer language with control-flow constructions for iteration and alternation, and it allows users to add additional functionality by defining new functions. For computationally intensive tasks, C, C++ and Fortran code can be linked and called at run time.

S is the statistician's Matlab and R is to S what Octave is to Matlab.

Availability 

On HPC2N we have R available as a module.

Usage at HPC2N 

To use the R module, first add it to your environment. Use:

module spider R

to see which versions are available and how to load the module and the needed prerequisites.

Example, loading R version 4.0.4

ml GCC/10.2.0 OpenMPI/4.0.5
ml R/4.0.4

NOTE: Once the R modules are loaded, you can access R on the command line or you can access the Rstudio GUI by typing on the command line "rstudio". Please use Rstudio for light weight analysis or setting up scripts. For heavy tasks run the created scripts on the batch system (see below). 

You can read more about loading modules on our Accessing software with Lmod page and our Using modules (Lmod) page.

Packages installed

  • Start R
  • At the R prompt, enter these four lines:
    ip <- as.data.frame(installed.packages()[,c(1,3:4)])
    
    rownames(ip) <- NULL
    
    ip <- ip[is.na(ip$Priority),1:2,drop=FALSE]
    
    print(ip, row.names=FALSE)
  • After the last enter, you will get a list of R packages and versions that
    are installed.

Running

Serial job

In order to run R from the batch system you need to pass a few extra parameters to R. This example submit file should get you started:

#!/bin/bash
### SNAC project number, enter your own
#SBATCH -A SNICXXXX-YY-ZZ 
# Asking for one core
#SBATCH -n 1
#SBATCH --time=00:10:00
# Serial job
# No matter how many processers you request this job will run
# on _only_ one core.

# First clear any modules from the environment.
ml purge >/dev/null 2>&1
ml GCC/10.2.0 OpenMPI/4.0.5
ml R/4.0.4
# Run R in batch mode. Use input.R as input file and
# store output in Rexample.out

R --no-save --quiet < input.R > Rexample.out

Parallel job

There are several R packages that enable code parallelization, for instance Rmpi and doParallel among others. Rmpi is installed by default on the different R versions available in our systems. Regarding doParallel, you will need to install it (install.packages).

Rmpi

In order to use it, you need to load the library and use a submission script.

Note for both:

  • you must NOT spawn slaves with mpi.spawn.Rslaves()!
  • You must use "mpirun R" in your script.

Here is an example submit script (you need to load the Rmpi library in your R script)

#!/bin/bash
#SBATCH -A SNICXXXX-YY-ZZZ
# Asking for 8 cores - you can pick more or less, depending on what you need
#SBATCH -n 8
# Asking for 30 min run time - change to fit what you need
#SBATCH --time=00:30:00

# First clear any modules from the environment.
ml purge >/dev/null 2>&1
# Load the R module
ml GCC/10.2.0 OpenMPI/4.0.5
ml R/4.0.4

mpirun R -q -f <program>.R

doParallel

The R script should contain the requested library and the cluster's initialization by establishing the cluster's size. After the code is executed, the cluster should be stopped. This is an example (we call it doParallel.R) of an R script which uses doParallel package:

library(doParallel)

cl <- makeCluster(4)
registerDoParallel(cl)

#code executed in parallel

stopCluster(cl)

The following batch script can be used for running this R script, notice that the cluster size above matches the number of requested cores in the batch script (-c 4):

#!/bin/bash
#SBATCH -A SNICXXXX-YY-ZZZ
#Asking for 10 min.
#SBATCH -t 00:10:00
#SBATCH -N 1
#SBATCH -c 4

# First clear any modules from the environment.
ml purge >/dev/null 2>&1
ml GCC/10.2.0 OpenMPI/4.0.5 
ml R/4.0.4

R -q --slave -f doParallel.R

Adding R modules

To add R modules not installed on our system see our documentaion on installing R modules in your own account.

Extending R

Please see the R documentation for information on how to extend R by creating your own packages. Writing R Extensions is a comprehensive guide and is highly recommended.

Additional info 
Updated: 2024-04-17, 14:47