Computer Architecture Group.

PLUTON CLUSTER

Page last modified on January 28, 2021.

Description

Pluton is a heterogeneous cluster intended for running High Performance Computing applications which has been co-funded by the Ministry of Economy and Competitiveness of Spain and by the EU through the European Regional Development Fund (project UNLC10-1E-728).

The cluster is managed by the Computer Architecture Group and is currently hosted at CITIC, a research centre with the participation of the University of A Coruña (Spain).

Hardware Configuration

Since the initial deployment in June 2013, the cluster has received several small hardware updates supported by newer research projects. As of January 2021, the cluster consists of:

+ A head node (or front-end node), which serves as an access point where users log in to interact with the cluster. The head node of Pluton can be accessed from Internet through ssh at pluton.dec.udc.es. The hardware of this node was recently upgraded in September 2019, currently providing up to 12 TiB of global NAS-like storage space for users. Moreover, it is interconnected with the computing nodes through Gigabit Ethernet and InfiniBand FDR.

+ 25 computing nodes, where all the computation is actually performed. These nodes provide all the computational resources (CPUs, GPUs, memory, disks) required for running the applications, with an aggregate computing capacity of up to 512 physical cores (1024 logical threads), 2.8 TiB of memory, 19 NVIDIA Tesla GPUs and 3 Intel Xeon Phi accelerators. All computing nodes are interconnected via Gigabit Ethernet and InfiniBand FDR networks.

Software Environment

The cluster runs Rocks 7 distribution based upon CentOS 7 (v7.9.2009), which is a free, community-supported GNU/Linux OS that is binary compatible with Red Hat Enterprise Linux (RHEL). Furthermore, Pluton relies on Slurm Workload Manager v19.05.2 as job scheduler and Lmod v8.1.18 for modules management. Additionally, the OpenHPC repository is used to manage and install some of the avaialable software.

Other relevant sofware available is:

+ Video driver v450.57 for NVIDIA Tesla GPUs
+ Intel MPSS v3.8.6 for Intel Xeon Phi accelerators
+ NVIDIA CUDA Toolkit (v9.2/v10.1/v10.2/v11.0)
+ Intel compilers and libraries (Parallel Studio XE 2019 and 2017)
+ GNU compilers (v7.3.0/v8.3.0/v9.3.0)
+ MPI libraries (MPICH/MVAPICH2/Open MPI/Intel MPI)
+ Intel OpenCL support (SDK 2019 and CPU Runtime 18.1)
+ Linux containers with udocker (v1.1.3/v1.1.4)
+ Python (v2.x/v3.x)
+ Java Development Kit (JDK 6/7/8/11/13)

User Guide

The user guide aims to provide the minimum information that is necessary for a new user in the system. Basically, it describes in detail the cluster and its hardware/software configuration, explains the file systems that are available for users and provides basic examples of how to run different types of applications using Slurm. The guide assumes that users are familiar with the most common utilities for GNU/Linux platforms.

Here you can download the latest version of the user guide. Do not hesitate to contact the administrator if you have any question (see Contact).