Below are the selected poster contributions that will be presented in the combined vendor exhibition / poster session.
Time: 18th Sep 2006, 17:50 – 19:00
Application of PVM to Protein Homology Search
A new computer programme named SEARCH and based on the Needleman-Wunsch algorithm was created for searching homologous proteins in large databases. SEARCH was found to be a very sensitive programme but it was rather slow – it required, with a Pentium 4 2.8 GHz computer, 1 hr 15 min 16 sec to search proteins homologous to cucumber basic protein (96 amino acids) in the database that contained 2,710,972 protein sequences. To improve the search time PVM was installed in 41 Pentium 4 2.8 GHz computers, consisting of 1 master and 40 slaves. When SEARCH was run under this system, search time was 2 min 6 sec, thus reducing the CPU time about 36-fold.
PDF of poster not available
Asynchronity in Collective Operation Implementation
Special attention is being paid to the phenomenon of divergence between synchronous collective operations and parallel program load balancing. A general way to increase collective operations performance while keeping their standard MPI semantics suggested. A discussion is addressed to internals of MPICH2, but approach is quite common and can be applied to MPICH and LAM MPI as well.
Automated Performance Comparison
Comparing the performance of different HPC platforms with different hardware, MPI libraries, compilers or sets of runtime options is a frequent task for implementors and users as well. Comparisons based on just a few numbers gained from a single execution of one benchmark or application are of very limited value as soon as the system is to run not only this software in exactly this configuration. However, the amount of data produced for thorough comparisons across a multidimensional parameter space quickly becomes hard to manage, and the relevant performance differences hard to locate. We deployed perfbase as a system to perform performance comparisons based on a large number of test results yet being able to immediately recognize relevant performance differences.
Improved GROMACS scaling on Ethernet switched clusters
We investigated the prerequisites for decent scaling of the GROMACS parallel molecular dynamics code on Ethernet Beowulf clusters. While the code scales well on supercomputers like the IBM p690 and on Linux clusters with a special interconnect like Myrinet or Infiniband, on Ethernet switched clusters, the scaling typically breaks down as soon as more than two computational nodes are involved.
On the Usability of High-Level Parallel IO in Unstructured Grid Simulations
For this poster, the feasibility of the two most common IO libraries for parallel IO was evaluated, and compared against a pure MPI-IO implementation. Instead of solely focusing on the raw transfer bandwidth achieved, API issues such as data preparation and call overhead were also taken into consideration. Only the access pattern resulting from parallel IO in unstructured grid applications, which is also one of the hardest patterns to optimize, was examined.
PARUS: a parallel programming framework for heterogeneous multiprocessor systems
PARUS is a parallel programing framework that allows to build parallel programs in data flow graph notation. The data flow graph is created by the developer either manually or automatically with the help of a script. The graph is then converted to C++/MPI source code and linked with PARUS runtime system. The next step is the parallel program execution on a cluster or multiprocessor system. PARUS also implements some approaches for load balancing on heterogeneous multi-processor system. There is a series of MPI tests that allow developer to estimate the information about communications in a multiprocessor system or a cluster.