Using the KNL nodes
Using the Intel Xeon Phi 7250 (KNL) nodes
Intel Xeon Phi x200, codename Knights Landing (KNL), is second generation MIC architecture product from Intel. At HPC2N 36 nodes, as a part of Kebnekaise, are equipped with Intel Xeon Phi 7250 CPUs. See our description of Intel Knight Landing for more information about the nodes.
At a high level there are three key aspects to achieving code performance on KNL:
- Using fine-grained parallelism to exploit the 68 cores per node and the 4 hardware threads per core
- Taking advantage of the 512-bit vector units on KNL
- Structuring your code to maximize memory access from KNL's 16 GB of onboard MCDRAM memory
- As we get more experience with the KNL nodes, the configuration may change and more documentation about the usage of the nodes will be added. It is recommended to check this page often for any changes of the KNL-nodes.
- A lot of software is not yet optimized for the Intel Xeon Phi x200 and has thus not been built for the KNL-nodes. Please contact firstname.lastname@example.org requesting installation of software.
- It is not (currently) possible to run jobs covering both the KNL nodes and other node types on Kebnekaise. We will evaluate whether (and if yes, how) this is possible in the future.
The KNL nodes are always allocated on a node basis, i.e., one will always get access to the complete node in exclusive mode. For the current policy of allocation and accounting the KNL-nodes see Allocation policy on Kebnekaise.
Access and Compiling
To make the process of compiling your own code for the KNLs easier we have, for the time being, allocated one KNL system for login access, kebnekaise-knl.hpc2n.umu.se. This may change in the future.
The login node is only to be used for compiling, job submission and very short tests of compiled programs!
The best choice for a toolchain on the KNLs is the intel toolchain.
It is highly recommended to use the following flags for the Intel compilers when compiling on the KNL build node:
It is also possible to use the ordinary login node in kebnekaise. When compiling on the normal login node, these flags are recommended:
To use the KNL nodes on Kebnekaise there are a couple of things to keep in mind:
- Specify the "knl" partition, #SBATCH -p knl
- Request 4 threads per core, #SBATCH --threads-per-core=4
When using squeue to check the status of your jobs, be patient! The output from squeue may be confusing, especially the status (ST) of the job. Depending on the availability of nodes with the required configuration (see below) the job status may be completing (CG) even if the job has not run yet. This is perfectly normal. Just wait 5-10 minutes and check the status again.
As can be seen in the description of the KNL nodes, there is something called MCDRAM. This is a small, 16GB, memory with much higher bandwith and lower latency then the ordinary DDR4 RAM.
The MCDRAM can operate in 3 modes:
- As a L3 cache (cache)
- As Fast Memory (flat)
- As a combination of cache and fast memory (hybrid)
The CPUs in the KNL nodes are grouped in NUMA islands. The NUMA islands communication can be configured in 5 different modes:
- All-to-all (a2a)
- Hemisphere (hemi)
- Sub-NUMA cluster 2 (snc2)
- Sub-NUMA cluster 4 (snc4)
- Quadrant (quad)
NOTE: At HPC2N we do not yet allow the snc4 mode due to problems booting the system in that configuration.
To select which mode the KNL shall be working in for a specific job one have to select this at batch job submission time using the --constraints option to sbatch. Select at most one configuration for MCDRAM and NUMA respectively.
To select the amount of MCDRAM used as fast memory in hybrid mode use:
Notes regarding memory modes
If there are sufficient KNL nodes available with the requested configuration the job will be started (normal batch queue rules still applies).
If there are no nodes available, the batch system will reconfigure free nodes and reboot them into the requested configuration. The reboot time will be added to the jobs total used walltime and you will be accounted for the added time.
There is a 15 minute delay before reconfiguring nodes and rebooting. This is done to give the nodes ample time to reboot before the system decides to reconfigure another set of nodes.
Not all combinations of constraints and Gres works. You should only specify the HBM gres if using running in hybrid meomory mode.
If squeue returns BadConstraints you problably have a bad combination and should cancel the job.
Hints and tricks
- To check for how many KNL nodes are idle now in each cluster mode you can use the sinfo command. Column 1 shows the available KNL computer node features, and Column 2 shows the number of nodes Available/Idle/Other/Total in this partition.
b-cn1203 [~]$ sinfo -p knl -o "%.22b %.20F" ACTIVE_FEATURES NODES(A/I/O/T) a2a,hemi,quad,snc2,s 0/0/1/1 rack12,knl 0/0/1/1 a2a,flat,rack12,knl 0/28/6/34
- Running all 68 KNL cores with AVX will limit clock speed to not use turbo.
- Getting the full floating point performance of the KNLs is likely to require 2-4 threads per core.
- Intel® Xeon Phi™ Processor 7250 Specification