Computer Architecture Group.

PLUTON CLUSTER

Page last modified on November 7, 2019.

Description

Pluton is a heterogeneous cluster intended for running High Performance Computing applications which has been co-funded by the Ministry of Economy and Competitiveness of Spain and by the EU through the European Regional Development Fund (project UNLC10-1E-728).

The cluster is managed by the Computer Architecture Group and is currently hosted at CITIC, a research centre with the participation of the University of A Coruña (Spain).

Hardware Configuration

Since the initial deployment in June 2013, the cluster has received several small hardware updates supported by newer research projects. As of October 2019, the cluster consists of:

+ A head node (or front-end node), which serves as an access point where users log in to interact with the cluster. The head node of Pluton can be accessed from Internet through ssh at pluton.dec.udc.es. The hardware of this node was recently upgraded in September 2019, currently providing up to 12 TiB of global NAS-like storage space for users. Moreover, it is interconnected with the computing nodes through Gigabit Ethernet and InfiniBand FDR.

+ 20 computing nodes, where all the computation is actually performed. These nodes provide all the computational resources (CPUs, GPUs, memory, disks) required for running the applications, with an aggregate computing capacity of up to 336 physical cores (672 logical threads), 1.4 TiB of memory, 17 NVIDIA Tesla GPUs, 3 Intel Xeon Phi accelerators and one AMD FirePro GPU. All computing nodes are interconnected via Gigabit Ethernet and InfiniBand FDR networks.

Software Environment

The entire cluster was recently updated in October 2019 to a new software environment running Rocks 7 distribution based upon CentOS 7 (v7.7.1908), which is a free, community-supported GNU/Linux OS that is binary compatible with Red Hat Enterprise Linux (RHEL). Furthermore, Pluton relies on Slurm Workload Manager v19.05.2 as job scheduler and Lmod v8.0.6 for modules management. Additionally, the OpenHPC repository is used to manage and install some of the avaialable software.

Other relevant sofware available is:

+ Video driver v430.50 for NVIDIA Tesla GPUs
+ Video driver v19.Q4 for the AMD FirePro GPU (Upcoming)
+ Intel MPSS v3.8.6 for Intel Xeon Phi accelerators
+ NVIDIA CUDA Toolkit (v9.2/v10.1)
+ Intel compilers and libraries (Parallel Studio XE 2019 and 2017)
+ GNU compilers (v7.3.0 and v8.3.0)
+ MPI libraries (MPICH/MVAPICH2/Open MPI/Intel MPI)
+ Intel OpenCL support (SDK 2019 and CPU Runtime 18.1)
+ Linux containers with udocker v1.1.3
+ Python (v2.x/v3.x)
+ Java Development Kit (JDK 6/7/8/11/13)

User Guide

The user guide aims to provide the minimum information that is necessary for a new user. Basically, it describes in detail the cluster and its hardware/software configuration, explains the file systems that are available for users and provides basic examples of how to run different types of applications using Slurm. The guide assumes that users are familiar with the most common utilities for GNU/Linux platforms.

Here you can download the latest version of the user guide. Access credentials are required, which are sent to your email when you cluster account is created. If you do not remember them, do not hesitate to contact the administrator (see Contact).