High-Performance Computing for Innovative Science: Clusters, communication and caramel wafers

30 Sep 2019

High Performance Computing (HPC) is making major contributions across a wide breadth of scientific disciplines and is essential for advanced data processing for data visualization, mathematical modelling, data simulations and computational biology. HPC is also a topic we have a range of expertise in across SEFARI and is vital to our research.

In July, forty scientists and system administrators came together to share their experience for a unique and inspiring day of talks and discussions on using high-performance computing for science. The workshop was funded by the SEFARI Responsive Opportunity Fund and organised by Sue Jones, Iain Milne and Gordon Stephen from the James Hutton Institute’s Information and Computational Sciences Group who discuss their recent activities in this blog.

As part of the workshop, which was held at the James Hutton Institute, we invited two guest speakers; Paul Fretter (Head of Computing infrastructure for Science, Norwich Bioscience Institutes) and Robert Maskell (UK Director of High-Performance Computing, Intel). Both speakers brought a wealth of real-world experience of using and developing HPC for innovative science.

Paul spoke about his experience of running shared high-performance computer resources across multiple science institutes, whilst ensuring to still sit on the “same side of the bench” as the scientists, while Robert gave fascinating insights into the future of high-performance computing and the directions being taken on hardware, as well as fun anecdotes from his days working with Professor Stephen Hawking.

We also arranged a series of short talks designed to facilitate discussions; including presentations on using cloud computing, bioinformatics and the challenges of big data, deep learning and utilizing graphical processing units for accelerating analyses.

The discussions that followed were open and enthusiastic due to the mix of high-performance computing users and system administrators. The workshop included participants from all the SEFARI institutes (BioSS, Hutton, Moredun, RBG-Edinburgh, Rowett, and SRUC) as well as the Natural History Museum, London and the National Institute of Agricultural Botany (NIAB); and for many this was the first time they had met to discuss scientific computing.

A key theme that came out of the discussions was the need for good communication; both in terms of system administrators being responsive and enabling; and scientists being focussed about their needs and managing their expectations.

In addition, there is a need for both system administrators and scientists to be forward thinking, as methods, approaches and available hardware need to evolve over time, as often the way of solving a specific question is not yet known.

There was also a lot of discussion about the pros and cons of utilizing shared resources, more applicable than ever with four of SEFARI’s six institutes joining with NIAB, RBG-Kew, and the Natural History Museum to develop and host a new BBSRC-funded computing cluster to support and expand a UK Crop Diversity Bioinformatics Resource. This resource will comprise over 3,200 compute cores, large memory nodes, and petabyte+ parallel storage and backup systems. The ability for SEFARI to build upon a larger shared computing resource like this could be key to its success in future funding initiatives and will facilitate continued expansion of the world leading research conducted by SEFARI institutes, and the workshop helped to instigate new ideas and conversations in this respect.

Other discussions covered topics ranging from resource requirements for fast animal disease diagnostics at SRUC, tackling data archiving at a time of data deluge, to the Natural History Museum’s ambitious project to digitise their collections dating back hundreds of years.

The workshop attendee’s discussions were fuelled by light refreshments including caramel wafers, and the day concluded with an enthusiastic buy-in of developing an ongoing network of scientists and system administrators to share expertise and best practice. Such a network will help to improve big data research and practice across the SEFARI institutes, which would be impossible without the latest high-performance computing resources and a cohort of expert staff to build and maintain them.

Sue Jones, Computational Biologist, The James Hutton Institute

Iain Milne, Research Software Engineer, The James Hutton Institute

Gordon Stephen, Research Software Engineer, The James Hutton Institute