Katherine Yelick: Performance Advantages of Partitioned Global Address Space Languages
Abstract. For nearly a decade, the Message Passing Interface (MPI) has been the dominant programming model for high performance parallel computing, in large part because it is universally available and scales to thousands of processors. In this talk I will describe some of the alternatives to MPI based on a Partitioned Global Address Space model of programming, such as UPC and Titanium. I will show that these models offer significant advantages in performance as well as programmer productivity, because they allow the programmer to build global data structures and perform one-sided communication in the form of remote reads and writes, while still giving programmers control over data layout. In particular, I will show that these languages make more effective use of cluster networks with RDMA support, allowing them to outperform two-sided communication on both microbenchmarks and bandwidth-limited computational problems, such as global FFTs. The key optimization is overlap of communication with computation and pipelining communication. Surprisingly, sending smaller messages more frequently can be faster than a few large messages if overlap with computation is possible. This creates an interesting open problem for global scheduling of communication, since the simple strategy of maximum aggregation is not always best. I will also show some of the productivity advantages of these languages through application case studies, including complete Titanium implementations of two different application frameworks: an immersed boundary method package and an elliptic solver using adaptive mesh refinement.
About the speaker.
Katherine Yelick is a Professor in the EECS Department at the University of California at Berkeley and head of the Future Technologies Group at Lawrence Berkeley National Laboratory. Her research in high performance computing addresses parallel programming languages, compiler analyses for explicitly parallel code, and optimization techniques for communication systems and memory systems. Much of her work has addressed the problems of programming irregular applications on parallel machines. Her projects include the Split-C, Titanium, and UPC parallel languages, the IRAM and ISTORE systems, and the Sparsity code generation system. She currently leads the Future Technologies group at LBNL and co-leads the Titanium and Bebop (Berkeley Benchmarking and Optimization) teams at the University of California, Berkeley. She is the director of the Berkeley Institute for Performance Studies, a collaborative project between UC Berkeley and LBNL.
She received her Bachelor’s, Master’s, and Ph.D. degrees from the Massachusetts Institute of Technology, where she worked on parallel programming methods and automatic theorem proving. She won the Geroge M. Sprowls Award for an outstanding Ph.D. dissertation at MIT and has received teaching awards from the EECS Departments at both MIT and Berkeley.