Compiler Flags

Compiler flags

[Intel Compilers | Gnu Compiler Collection (GCC) Compilers | Portland Group (PGI) Compilers ]

This page contains information about some of the more important/popular flags for the compilers available at HPC2N. The flags below can all be taken to be valid for the Fortran and C/C++ compilers alike, as well as for compiling with the MPI libraries included (remember to load the proper modules - see the page Installed compilers for more information about that.)

Intel Compilers

The Intel compilers are installed on Abisko and Kebnekaise. Note that we only have a limited number of licenses. Because of this, you may sometimes experince that there is none available. Wait for a little while and then try again. 

  • -fast This option maximizes speed across the entire program.  
  • -g Produce symbolic debug information in object file. The compiler does not support the generation of debugging information in assemblable files. If you specify the -g option, the resulting object  file  will contain debugging information, but the assemblable file will not. The -g option changes the default optimization from -O2 to -O0. It is often a good idea to add -traceback also, so the compiler generates extra information in the object file to provide source file traceback information.
  • -debug all Enables generation of enhanced debugging information. You need to also specify -g
  • -O0 Disable optimizations. Use if you want to be certain of getting correct code. Otherwise use -O2 for speed.
  • -O Same as -O2
  • -O1 Optimize to favor code size and code locality. Disables loop unrolling. -O1 may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops. In most cases, -O2 is recommended over -O1.
  • -O2 (default) Optimize for code speed. This is the generally recommended optimization level.
  • -O3 Enable -O2 optimizations and in addition, enable more aggressive optimizations such as loop and memory access transformation, and prefetching. The -O3 option optimizes for maximum speed, but may not improve performance for some programs. The -O3 optimizations may slow down code in some cases compared to -O2 optimizations. Recommended for applications that have loops with heavy use of floating point calculations and process large data sets.
  • -Os Enable speed optimizations, but disable some optimizations that increase code size for small speed benefit.
  • -fpe{0,1,3} Allows some control over floating-point exception (divide by zero, overflow, invalid operation, underflow, denormalized number, positive infinity, negative infinity or a NaN) handling for the main program at runtime. Fortran only. Default is -fpe3 meaning all floating-point exceptions are disabled and floating-point underflow is gradual, unless you explicitly specify a compiler option that enables flush-to-zero. The default value may slow runtime performance.
  • -qopenmp Enable the parallelizer to generate multi-threaded code based on the  OpenMP directives. The code can be executed in parallel on both uniprocessor and multiprocessor systems.
  • -parallel Enable the auto-parallelizer to generate multi-threaded code for loops that can be safely executed in parallel. The -parallel option enables the auto-parallelizer if either the -O2 or -O3 optimization option is also on (the default is -O2). You might need  to set the KMP_STACKSIZE environment variable to an appropriately large size, like 16m, to use this option.

To read about other flags, and for further information, look in the man files. They can be accessed like this:

$ module load intel
$ man ifort
$ man icc 

Here are some links to places with more information:

GNU Compiler Collection (GCC)

  • -o file Place output in file 'file'.
  • -c Compile or assemble the source files, but do not link.
  • -mfma Use FMA4 instructions. Only valid on Abisko. Is activated by default. Can be deactivated with -mno-fma.
  • -fopenmp Enable handling of the OpenMP directives.
  • -g Produce debugging information in the operating systems native format.
  • -O or -O1 Optimize. The compiler tried to reduce code size and execution time. 
  • -O2 Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. 
  • -O3 Optimize even more. The compiler will also do loop unrolling and function inlining. RECOMMENDED
  • -O0 Do not optimize. This is the default.
  • -Os Optimize for size.
  • -Ofast Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math and the Fortran-specific -fno-protect-parens and -fstack-arrays.
  • -ffast-math Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range.
  • -l library Search the library named 'library' when linking.

To read about other flags, and for further information, look in the man files. They can be accessed by first loading the module (see Installed Compilers) and then doing either:

$ man gfortran
$ man gcc
$ man g++

Here are links to places with more information:

Portland group (PGI) compilers

  • -fast Chooses generally optimal flags for the target platform. This sets the optimization level to a minimum of 2.
  • -fastsse Set of optimizations. It vectorizes loops and uses Streaming SIMD Extensions (SSE/SSE2) which utilize Opteron's eight 128 bit registers and usually produces faster code.
  • -g Generate symbolic debug information. This also sets the optimization level to zero, unless a -O switch is present on the command line. Symbolic debugging may give confusing results if an optimization other than zero is selected.
  • -Mfma (default -Mnofma) Generate (don't generate) fused multiply-add (FMA) instructions for targets that support it.  FMA instructions are generally faster than separate multiply-add instructions, and can generate higher precision results since the multiply result is not rounded before the addition.  However, because of this, the result may be different than the unfused multiply and add instructions.  FMA instructions are enabled with higher optimization levels.
  • -mp Include OpenMP directives.
  • -O[level] Set the optimization level. If -O is not specified, then the default level is 1 if -g is not specified, and 0 if -g is specified. If a number is not specified with -O, then the optimization level is set to 2. The optimization levels are:
    • 0 A basic block is generated for each statement. No scheduling is done between statements. No global optimizations are done.
    • 1 Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimizations are performed.
    • 2 All level 1 optimizations are performed. In addition, traditional scalar optimizations, such as induction recognition and loop invariant motion are performed by the global optimizer. RECOMMENDED
    • 3 All level 1 and 2 optimizations are performed. In addition, this level enables more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable.

To read about other flags, and for further information, look in the man files. They can be accessed like this (remember to load the module first. See how on 'Installed Compilers':

$ man pgf77
$ man pgf90
$ man pgf95
$ man pgcc
$ man pgCC

Here is a link to a place with more information:

Updated: 2017-11-21, 20:49