Parallel Programs (parallel + program)

Distribution by Scientific Domains


Selected Abstracts


Structural testing criteria for message-passing parallel programs

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 16 2008
S. R. S. Souza
Abstract Parallel programs present some features such as concurrency, communication and synchronization that make the test a challenging activity. Because of these characteristics, the direct application of traditional testing is not always possible and adequate testing criteria and tools are necessary. In this paper we investigate the challenges of validating message-passing parallel programs and present a set of specific testing criteria. We introduce a family of structural testing criteria based on a test model. The model captures control and data flow of the message-passing programs, by considering their sequential and parallel aspects. The criteria provide a coverage measure that can be used for evaluating the progress of the testing activity and also provide guidelines for the generation of test data. We also describe a tool, called ValiPar, which supports the application of the proposed testing criteria. Currently, ValiPar is configured for parallel virtual machine (PVM) and message-passing interface (MPI). Results of the application of the proposed criteria to MPI programs are also presented and analyzed. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Adaptive structured parallelism for distributed heterogeneous architectures: a methodological approach with pipelines and farms

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 15 2010
Horacio González-Vélez
Abstract Algorithmic skeletons abstract commonly used patterns of parallel computation, communication, and interaction. Based on the algorithmic skeleton concept, structured parallelism provides a high-level parallel programming technique that allows the conceptual description of parallel programs while fostering platform independence and algorithm abstraction. This work presents a methodology to improve skeletal parallel programming in heterogeneous distributed systems by introducing adaptivity through resource awareness. As we hypothesise that a skeletal program should be able to adapt to the dynamic resource conditions over time using its structural forecasting information, we have developed adaptive structured parallelism (ASPARA). ASPARA is a generic methodology to incorporate structural information at compilation into a parallel program, which will help it to adapt at execution. ASPARA comprises four phases: programming, compilation, calibration, and execution. We illustrate the feasibility of this approach and its associated performance improvements using independent case studies based on two algorithmic skeletons,the task farm and the pipeline,evaluated in a non-dedicated heterogeneous multi-cluster system. Copyright © 2010 John Wiley & Sons, Ltd. [source]


A Parallel PCG Solver for MODFLOW

GROUND WATER, Issue 6 2009
Yanhui Dong
In order to simulate large-scale ground water flow problems more efficiently with MODFLOW, the OpenMP programming paradigm was used to parallelize the preconditioned conjugate-gradient (PCG) solver with in this study. Incremental parallelization, the significant advantage supported by OpenMP on a shared-memory computer, made the solver transit to a parallel program smoothly one block of code at a time. The parallel PCG solver, suitable for both MODFLOW-2000 and MODFLOW-2005, is verified using an 8-processor computer. Both the impact of compilers and different model domain sizes were considered in the numerical experiments. Based on the timing results, execution times using the parallel PCG solver are typically about 1.40 to 5.31 times faster than those using the serial one. In addition, the simulation results are the exact same as the original PCG solver, because the majority of serial codes were not changed. It is worth noting that this parallelizing approach reduces cost in terms of software maintenance because only a single source PCG solver code needs to be maintained in the MODFLOW source tree. [source]


A parallel program using SHELXD for quick heavy-atom partial structural solution on high-performance computers

JOURNAL OF APPLIED CRYSTALLOGRAPHY, Issue 2 2007
Zheng-Qing Fu
A parallel algorithm has been designed for SHELXD to solve the heavy-atom partial structures of protein crystals quickly. Based on this algorithm, a program has been developed to run on high-performance multiple-CPU Linux PCs, workstations or clusters. Tests on the 32-CPU Linux cluster at SER-CAT, APS, Argonne National Laboratory, show that the parallelization dramatically speeds up the process by a factor of roughly the number of CPUs applied, leading to reliable and instant heavy-atom sites solution, which provides the practical opportunity to employ heavy-atom search as an alternative tool for anomalous scattering data quality evaluation during single/multiple-wavelength anomalous diffraction (SAD/MAD) data collection at synchrotron beamlines. [source]


Fully quantum mechanical energy optimization for protein,ligand structure

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 12 2004
Yun Xiang
Abstract We present a quantum mechanical approach to study protein,ligand binding structure with application to a Adipocyte lipid-binding protein complexed with Propanoic Acid. The present approach employs a recently develop molecular fractionation with a conjugate caps (MFCC) method to compute protein,ligand interaction energy and performs energy optimization using the quasi-Newton method. The MFCC method enables us to compute fully quantum mechanical ab initio protein,ligand interaction energy and its gradients that are used in energy minimization. This quantum optimization approach is applied to study the Adipocyte lipid-binding protein complexed with Propanoic Acid system, a complex system consisting of a 2057-atom protein and a 10-atom ligand. The MFCC calculation is carried out at the Hartree,Fock level with a 3-21G basis set. The quantum optimized structure of this complex is in good agreement with the experimental crystal structure. The quantum energy calculation is implemented in a parallel program that dramatically speeds up the MFCC calculation for the protein,ligand system. Similarly good agreement between MFCC optimized structure and the experimental structure is also obtained for the streptavidin,biotin complex. Due to heavy computational cost, the quantum energy minimization is carried out in a six-dimensional space that corresponds to the rigid-body protein,ligand interaction. © 2004 Wiley Periodicals, Inc. J Comput Chem 25: 1431,1437, 2004 [source]


Adaptive structured parallelism for distributed heterogeneous architectures: a methodological approach with pipelines and farms

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 15 2010
Horacio González-Vélez
Abstract Algorithmic skeletons abstract commonly used patterns of parallel computation, communication, and interaction. Based on the algorithmic skeleton concept, structured parallelism provides a high-level parallel programming technique that allows the conceptual description of parallel programs while fostering platform independence and algorithm abstraction. This work presents a methodology to improve skeletal parallel programming in heterogeneous distributed systems by introducing adaptivity through resource awareness. As we hypothesise that a skeletal program should be able to adapt to the dynamic resource conditions over time using its structural forecasting information, we have developed adaptive structured parallelism (ASPARA). ASPARA is a generic methodology to incorporate structural information at compilation into a parallel program, which will help it to adapt at execution. ASPARA comprises four phases: programming, compilation, calibration, and execution. We illustrate the feasibility of this approach and its associated performance improvements using independent case studies based on two algorithmic skeletons,the task farm and the pipeline,evaluated in a non-dedicated heterogeneous multi-cluster system. Copyright © 2010 John Wiley & Sons, Ltd. [source]


HPCTOOLKIT: tools for performance analysis of optimized parallel programs,

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 6 2010
L. Adhianto
Abstract HPCTOOLKIT is an integrated suite of tools that supports measurement, analysis, attribution, and presentation of application performance for both sequential and parallel programs. HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully optimized parallel programs with a measurement overhead of only a few percent. Recently, new capabilities were added to HPCTOOLKIT for collecting call path profiles for fully optimized codes without any compiler support, pinpointing and quantifying bottlenecks in multithreaded programs, exploring performance information and source code using a new user interface, and displaying hierarchical space,time diagrams based on traces of asynchronous call path samples. This paper provides an overview of HPCTOOLKIT and illustrates its utility for performance analysis of parallel applications. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Optimizing process allocation of parallel programs for heterogeneous clusters

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 4 2009
Shuichi Ichikawa
Abstract The performance of a conventional parallel application is often degraded by load-imbalance on heterogeneous clusters. Although it is simple to invoke multiple processes on fast processing elements to alleviate load-imbalance, the optimal process allocation is not obvious. Kishimoto and Ichikawa presented performance models for high-performance Linpack (HPL), with which the sub-optimal configurations of heterogeneous clusters were actually estimated. Their results on HPL are encouraging, whereas their approach is not yet verified with other applications. This study presents some enhancements of Kishimoto's scheme, which are evaluated with four typical scientific applications: computational fluid dynamics (CFD), finite-element method (FEM), HPL (linear algebraic system), and fast Fourier transform (FFT). According to our experiments, our new models (NP-T models) are superior to Kishimoto's models, particularly when the non-negative least squares method is used for parameter extraction. The average errors of the derived models were 0.2% for the CFD benchmark, 2% for the FEM benchmark, 1% for HPL, and 28% for the FFT benchmark. This study also emphasizes the importance of predictability in clusters, listing practical examples derived from our study. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Structural testing criteria for message-passing parallel programs

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 16 2008
S. R. S. Souza
Abstract Parallel programs present some features such as concurrency, communication and synchronization that make the test a challenging activity. Because of these characteristics, the direct application of traditional testing is not always possible and adequate testing criteria and tools are necessary. In this paper we investigate the challenges of validating message-passing parallel programs and present a set of specific testing criteria. We introduce a family of structural testing criteria based on a test model. The model captures control and data flow of the message-passing programs, by considering their sequential and parallel aspects. The criteria provide a coverage measure that can be used for evaluating the progress of the testing activity and also provide guidelines for the generation of test data. We also describe a tool, called ValiPar, which supports the application of the proposed testing criteria. Currently, ValiPar is configured for parallel virtual machine (PVM) and message-passing interface (MPI). Results of the application of the proposed criteria to MPI programs are also presented and analyzed. Copyright © 2008 John Wiley & Sons, Ltd. [source]


JOPI: a Java object-passing interface

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 7-8 2005
Jameela Al-Jaroodi
Abstract Recently there has been an increasing interest in developing parallel programming capabilities in Java to harness the vast resources available in clusters, grids and heterogeneous networked systems. In this paper, we introduce a Java object-passing interface (JOPI) library. JOPI provides Java programmers with the necessary functionality to write object-passing parallel programs in distributed heterogeneous systems. JOPI provides a Message Passing Interface (MPI)-like interface that can be used to exchange objects among processes. In addition to the well-known benefits of the object-oriented development model, using objects to exchange information in JOPI is advantageous because it facilitates passing complex structures and enables the programmer to isolate the problem space from the parallelization problem. The run-time environment for JOPI is portable, efficient and provides the necessary functionality to deploy and execute parallel Java programs. Experiments were conducted on a cluster system and a collection of heterogeneous platforms to measure JOPI's performance and compare it with MPI. The results show good performance gains using JOPI. Copyright © 2005 John Wiley & Sons, Ltd. [source]


SCALEA: a performance analysis tool for parallel programs

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 11-12 2003
Hong-Linh Truong
Abstract Many existing performance analysis tools lack the flexibility to control instrumentation and performance measurement for code regions and performance metrics of interest. Performance analysis is commonly restricted to single experiments. In this paper we present SCALEA, which is a performance instrumentation, measurement, analysis, and visualization tool for parallel programs that supports post-mortem performance analysis. SCALEA currently focuses on performance analysis for OpenMP, MPI, HPF, and mixed parallel programs. It computes a variety of performance metrics based on a novel classification of overhead. SCALEA also supports multi-experiment performance analysis that allows one to compare and to evaluate the performance outcome of several experiments. A highly flexible instrumentation and measurement system is provided which can be controlled by command-line options and program directives. SCALEA can be interfaced by external tools through the provision of a full Fortran90 OpenMP/MPI/HPF frontend that allows one to instrument an abstract syntax tree at a very high-level with C-function calls and to generate source code. A graphical user interface is provided to view a large variety of performance metrics at the level of arbitrary code regions, threads, processes, and computational nodes for single- and multi-experiment performance analysis. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Deadlock detection in MPI programs

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 11 2002
Glenn R. Luecke
Abstract The Message-Passing Interface (MPI) is commonly used to write parallel programs for distributed memory parallel computers. MPI-CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. This paper presents the methods used in MPI-CHECK 2.0 to detect many situations where actual and potential deadlocks occur when using blocking and non-blocking point-to-point routines as well as when using collective routines. Copyright © 2002 John Wiley & Sons, Ltd. [source]