Lightweight Operating Systems for Scalable Native and Virtualized Supercomputing

Colloq: Speaker: 
Kevin Pedretti
Colloq: Speaker Institution: 
Sandia National Laboratories
Colloq: Date and Time: 
Mon, 2009-04-20 10:00
Colloq: Location: 
ORNL, Bldg. 5100, Room 125
Colloq: Host: 
Jeffrey Vetter
Colloq: Host Email:
Colloq: Abstract: 
Capability supercomputers are massively complex, both in software and hardware. General-purpose operating systems have grown so complicated that they significantly impede the innovation that will be necessary to take full advantage of future multi-core architectures, which are likely to incorporate heterogeneous and hierarchical computing elements. This talk focuses on the compute node operating system and the work Sandia is doing to keep it simple, efficient, and functional. The case will be made that general-purpose operating systems, even slimmed down ones, add unnecessary complexity to the system and are detrimental to performance.We are developing a new light weight kernel operating system, named Kitten, that addresses many of the short-comings of previous light weight kernels while retaining their superior scalability characteristics. Kitten is heavily based on Linux but rewinds it to a much earlier design point. Unnecessary complexities such as demand paging have been replaced by simpler mechanisms. Kitten provides partial Linux API and ABI compatibility so that standard toolchains and Linux ELF executables can be used without change in most cases. Additionally, Kitten utilizes the Palacios virtual machine monitor developed at Northwestern University and the University of New Mexico to support on-demand loading of unmodified guest operating systems. Performance results for both native and virtualized Catamount and Compute Node Linux on a Cray XT4 development system (48 quad-core compute nodes) will be presented.
Colloq: Speaker Bio: 
Kevin Pedretti is a Senior Member of Technical Staff at Sandia National Laboratories in Albuquerque, New Mexico. He joined Sandia in 2001 as a member of the system software team for the Cplant project, which used COTS hardware and an in-house developed system software stack to field the largest-scale production Linux clusters of the time. He subsequently became a member of the joint Sandia and Cray software team for Red Storm, working on the compute node allocator, Catamount lightweight kernel, Portals networking stack, and SeaStar firmware. Currently he is leading a Laboratory Directed Research and Development (LDRD) project investigating system software techniques to best leverage multi-core processors and hardware virtualization on future capability platforms.