• Posted on: 13 October 2016
  • By: bbrydsoe


[ Details: Compute nodes | Compute (Skylake-SP) nodesCompute nodes (AMD Zen3) | Largemem nodes | GPU nodes (K80) | GPU nodes (V100) | GPU nodes (A100) ]

Kebnekaise is the latest supercomputer at HPC2N. It is named after the massif of the same name, which has some of Sweden's highest mountain peaks (Sydtoppen and Nordtoppen). Just as the massif, the supercomputer Kebnekaise is a system with many faces. 

Kebnekaise was delivered by Lenovo and installed during the summer 2016, except for the 36 nodes with the (then) new generation of Intel Xeon Phi, also known as Intel Knights Landing (KNL), which were installed during spring 2017. These nodes have since been decommissioned. Kebnekaise was opened up for general availability on November 7, 2016. 

In 2018, Kebnekaise was extended with 52 Intel Xeon Gold 6132 (Skylake) nodes, as well as 10 NVidian V100 (Volta) GPU nodes.

In 2023, Kebnekaise was extended further, with 2 dual NVIDIA A100 GPU nodes and one many-core AMD Zen3 CPU node.

Kebnekaise Celebration and HPC2N Open House was held 30 November 2017.

Node Type #nodes CPU Cores Memory Infiniband Notes
Compute 432 Intel Xeon E5-2690v4 2x14 128 GB/node FDR  
Compute-skylake 52 Intel Xeon Gold 6132 2x14 192 GB/node EDR Some of the Skylake nodes are reserved for WLCG use.
Compute-AMD Zen3 1 AMD Zen3 (AMD EPYC 7763) 2x64 1 TB/node EDR  
Large Memory 20 Intel Xeon E7-8860v4 4x18 3072 GB/node EDR Allocations for the Large Memory nodes is handled separately.
2xK80 32 Intel Xeon E5-2690v4
2x NVidia K80
128 GB/node FDR Each K80 card contains 2 GK210 GPU engines.
4xK80 4 Intel Xeon E5-2690v4
4x NVidia K80
128 GB/node FDR  
2xV100 10 Intel Xeon Gold 6132
2x NVidia V100
192 GB/node EDR  
2xA100 2 AMD Zen3 (AMD EPYC 7413) 2x24
512GB/node EDR These nodes run Ubuntu Jammy 22.04 LTS.
KNL 36 Intel Xeon Phi 7250
(Knight's Landing)
68 192 GB/node
16 GB MCDRAM/node
FDR Decommissioned

There is local scratch space on each node (about 170GB, SSD), which is shared between the jobs currently running. Connected to Kebnekaise is also our parallel file system Ransarn (where your project storage is located), which provide quick acccess to files regardless of which node they run on. For more information about the different filesystems that are available on our systems, read the Filesystems and Storage page.

All nodes are running Ubuntu Focal (20.04 LTS). We use EasyBuild to build software and we also use a module system called Lmod. We are still improving the portfolio of installed software. The software page currently lists only a few of the installed software packages. Please log in to Kebnekaise (regular: kebnekaise or ThinLinc: kebnekaise-tl) for a list of all available software packages.

NOTE: There is a special login node for the A100 GPUs that is AMD Zen3 (AMD EPYC 7313) and with 1 A100 card: kebnekaise-amd (for ThinLinc: kebnekaise-amd-tl). It is also running Ubuntu Jammy 22.04 like the A100 nodes, and is recommended for when you are using the A100 GPUs as it allows you to see which software is available on them.

With all the different node types of Kebnekaise, the scheduling of jobs is somewhat more complicated than on our previous systems. Different node types are "charged" differently. See the allocation policy on Kebnekaise page for details. Kebnekaise is using SLURM for job management and scheduling.

Kebnekaise in numbers

  • 602 nodes
  • 15 racks
  • 19288 cores (of which 2448 cores are KNL-cores)
    • 18840 available for users (the rest are for managing the cluster)
  • 501760 CUDA cores (80 * 4992 cores/K80 + 20 * 5120 cores/V100)
  • 12800 Tensor cores (20 * 640 cores/V100)
  • More than 136 TB memory (20*3TB + (432 + 36) * 128GB + (52 + 10) * 192 GB + 36 * 192GB)
  • 71 switches (Infiniband, Access and Managment networks)
  • 984 TFlops/s Peak performance
  • 791 TFlops/s HPL
  • HPL: 80% of Peak performance
HPL performance of Kebnekaise
Compute Nodes 374 TFlops/s
Compute-skylake Nodes 87 TFlops/s
Large Memory Nodes 34 TFlops/s
2xK80 Nodes 129 TFlops/s
4xK80 Nodes 30 TFlops/s
2xV100 Nodes 75 TFlops/s
KNL Nodes (decommissioned) 62 TFlops/s
Total (all parts) 791 TFlops/s

Do note that running all 28 cores with lots of AVX (on the normal CPUs) will limit the clock to at absolute maximum 2.9 GHz per core, and probably no more than 2.5.

The AVX clock frequency != the rest of the CPUs clock frequency and has a lower starting point and lower max boost.

Detailed node Info

Compute nodes

Architecture is Intel Xeon E5-2690v4 (Broadwell). See further down for info for the Skylake-SP nodes.

Each core has:

  • 64 kB L1 cache
    • 32 kB L1 data cache
    • 32 kB L1 instruction cache
  • 256 kB L2 cache
  • 35 MB L3 cache that is shared between 14 cores (1 NUMA island)

The memory is shared in the whole node, but physically 64 GB is placed on each NUMA island. The memory controller on each NUMA node has 4 channels.

Intel Xeon E5-2690v4 (Broadwell)
Instruction set AVX2 & FMA3
SP FLOPs/cycle 32
DP FLOPs/cycle 16
Base Frequency 2.6 GHz
Turbo Mode Frequency (single core) 3.8 GHz
Turbo Mode Frequency (all cores) 2.9 GHz


Compute nodes, Skylake-SP

Architecture is Intel Xeon Gold 6132 (Skylake-SP).

Each core has:

  • 64 kB L1 cache
    • 32 kB L1 data cache
    • 32 kB L1 instruction cache
  • 1 MB L2 cache (private per core)
  • 1.375 MB L3 cache (total of 19.25 MB shared between cores)

The memory is shared in the whole node, but physically 96 GB is placed on each NUMA island. The memory controller on each NUMA node has 6 channels.

The Intel Xeon Gold 6132 has two AVX-512 FMA units per core.

Some more information can be found here and here.

Intel Xeon Gold 6132 (Skylake-SP)
Instruction set SSE4.2, AVX, AVX2, AVX-512
SP FLOPs/cycle 64 (32 per AVX-512 FMA unit)
DP FLOPs/cycle 32 (16 per AVX-512 FMA unit)
Base Frequency 2.6 GHz
Turbo Mode Frequency (single core)  
Turbo Mode Frequency (all cores)  

Thus it is possible to run 32 double precision or 64 single precision floating point operations per second per clock cycle within the 512-bit vectors, as well as eight 64-bit and sixteen 32-bit integers, with up to two 512-bit fused-multiply add (FMA) units.

Compute nodes, AMD Zen3

Architecture is AMD Zen3 (AMD EPYC 7763 64-Core)

 - The CPU-only node have 2 CPU sockets with 64 cores each and 1TB of memory (or 8020MB/core usable)

Large memory nodes

There are 18 cores on each of the 4 NUMA islands. The cores on each NUMA island share 768 GB memory, but have access to the full 3072 GB on the node. The memory controller on each NUMA island has 4 channels.

Each core has:
  • 64 kB L1 cache
    • 32 kB L1 data cache
    • 32 kB L1 instruction cache
  • 256 kB L2 cache
  • 45 MB L3 cache shared between the cores on each NUMA island


GPU nodes, K80

Each CPU core is identical to the cores in the compute nodes and in addition to that:

  • 32 GPU nodes have 2 K80 GPUs
    • One K80 is located on each NUMA island
  • 4 GPU nodes have 4 K80 GPUs
    • Two K80s are located on each NUMA island.


One GK210 compute engine with 15 SMXs (13 enabled)

One GK210 compute engine with 15 SMXs (13 enabled). SMX is what NVIDIA calls their Next Generation Streaming Multiprocessor. (Picture copyright of NVIDIA)


One SMX. SMX is what NVIDIA calls their Next Generation Streaming Multiprocessor. (Picture copyright of NVIDIA)

Each K80 GPU has two GK210 chips (compute engines), each of which are made up of 15 SMX (Next Generation Streaming Multiprocessor) units and six 64‐bit memory controllers. Due to configuration and problems in fitting the two GK210s on a single K80, only 13 SMX units are enabled on each GK210 card. Since there are 192 CUDA cores on each SMX, this adds up to 13 x 192 x 2 = 4992 cores on each K80.

The GK210 SMX units feature 192 single-precision CUDA cores, and each core has fully pipelined floating-point and integer arithmetic logic units. It retains full IEEE 754-2008 compliant single- and double-precision arithmetic, including the fused multiply-add (FMA) operation.

Tesla K80 is rated for a maximum double precision (FP64) throughput of 2.9 TFLOPS, or a single precision (FP32) throughput of 8.7 TFLOPS.

Tesla K80 (2 x GK210)
Stream Processors 2 x 2496
Core Clock 562MHz
Boost Clock(s) 875MHz
Memory Clock 5GHz GDDR5
Single Precision 8.74 TFLOPS
Double Precision 2.91 TFLOPS
Memory Bus Width 384-bit
Memory Bandwidth 240 GB/s
Register File Size 512KB
Shared Memory / L1 Cache 128KB
Threads/Warp 32
Max Threads / Thread Block 1024
Max Warps / Multiprocessor 64
Max Threads / Multiprocessor 2048
Max Thread Blocks / Multiprocessor 16
32-bit Registers / Multiprocessor 131072
Max Registers / Thread Block 65536
Max Registers / Thread 255
Max Shared Memory / Multiprocessor 112K
Max Shared Memory / Thread Block 48K
Hyper-Q Yes
Dynamic Parallelism Yes

GPU nodes, V100

We have 10 nodes with NVidia V100 (Volta) GPUs. Each CPU core is identical to the cores in the Skylake compute nodes and in addition to that the nodes each have 

  • 2 V100 GPUs each with
    • 2x5120(CUDA) cores
    • 2x640(Tensor) cores

One V100 GPU is located on each NUMA island.

GPU nodes, A100

 - The GPU enabled nodes (AMD EPYC 7413 24-Core) have 2 CPU sockets with 24 cores each, i.e. 48 in total and 512GB memory (or 10600MB/core usable)

KNL nodes

Updated: 2023-09-11, 11:36