Parallel Computing (parallel + computing)

Distribution by Scientific Domains


Selected Abstracts


Parallel computing of high-speed compressible flows using a node-based finite-element method

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 3 2003
T. Fujisawa
Abstract An efficient parallel computing method for high-speed compressible flows is presented. The numerical analysis of flows with shocks requires very fine computational grids and grid generation requires a great deal of time. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed seamlessly in parallel in terms of nodes. Local finite-element mesh is generated robustly around each node, even for severe boundary shapes such as cracks. The algorithm and the data structure of finite-element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. The inter-processor communication is minimized by renumbering the nodal identification number using ParMETIS. The numerical scheme for high-speed compressible flows is based on the two-step Taylor,Galerkin method. The proposed method is implemented on distributed memory systems, such as an Alpha PC cluster, and a parallel supercomputer, Hitachi SR8000. The performance of the method is illustrated by the computation of supersonic flows over a forward facing step. The numerical examples show that crisp shocks are effectively computed on multiprocessors at high efficiency. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Aerodynamic shape optimization on overset grids using the adjoint method

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 12 2010
Wei Liao
Abstract This paper deals with the use of the continuous adjoint equation for aerodynamic shape optimization of complex configurations with overset grids methods. While the use of overset grid eases the grid generation process, the non-trivial task of ensuring communication between overlapping grids needs careful attention. This need is effectively addressed by using a practically useful technique known as the implicit hole cutting (IHC) method. The method depends on a simple cell selection process based on the criterion of cell size, and all grid points including interior points and fringe points are treated indiscriminately in the computation of the flow field. This paper demonstrates the simplicity of the IHC method for the adjoint equation. Similar to the flow solver, the adjoint equations are solved on conventional point-matched and overlapped grids within a multi-block framework. Parallel computing with message passing interface is also used to improve the overall efficiency of the optimization process. The method is successfully demonstrated in several two- and a three-dimensional shape optimization cases for both external and internal flow problems. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Parallel computing for real-rime signal processing and control, M. O. Tokhi, M. A. Hossain and M. H. Shaheed, Springer, London, 2003, xiii + 253pp

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 6 2004
D.I. Jones Dr.
No abstract is available for this article. [source]


A Polymorphic Dynamic Network Loading Model

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 2 2008
Nie Yu (Marco)
The polymorphism, realized through a general node-link interface and proper discretization, offers several prominent advantages. First of all, PDNL allows road facilities in the same network to be represented by different traffic flow models based on the tradeoff of efficiency and realism and/or the characteristics of the targeted problem. Second, new macroscopic link/node models can be easily plugged into the framework and compared against existing ones. Third, PDNL decouples links and nodes in network loading, and thus opens the door to parallel computing. Finally, PDNL keeps track of individual vehicular quanta of arbitrary size, which makes it possible to replicate analytical loading results as closely as desired. PDNL, thus, offers an ideal platform for studying both analytical dynamic traffic assignment problems of different kinds and macroscopic traffic simulation. [source]


Experimental analysis of a mass storage system

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 15 2006
Shahid Bokhari
Abstract Mass storage systems (MSSs) play a key role in data-intensive parallel computing. Most contemporary MSSs are implemented as redundant arrays of independent/inexpensive disks (RAID) in which commodity disks are tied together with proprietary controller hardware. The performance of such systems can be difficult to predict because most internal details of the controller behavior are not public. We present a systematic method for empirically evaluating MSS performance by obtaining measurements on a series of RAID configurations of increasing size and complexity. We apply this methodology to a large MSS at Ohio Supercomputer Center that has 16 input/output processors, each connected to four 8 + 1 RAID5 units and provides 128 TB of storage (of which 116.8 TB are usable when formatted). Our methodology permits storage-system designers to evaluate empirically the performance of their systems with considerable confidence. Although we have carried out our experiments in the context of a specific system, our methodology is applicable to all large MSSs. The measurements obtained using our methods permit application programmers to be aware of the limits to the performance of their codes. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Advanced eager scheduling for Java-based adaptive parallel computing

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 7-8 2005
Michael O. Neary
Abstract Javelin 3 is a software system for developing large-scale, fault-tolerant, adaptively parallel applications. When all or part of their application can be cast as a master,worker or branch-and-bound computation, Javelin 3 frees application developers from concerns about inter-processor communication and fault tolerance among networked hosts, allowing them to focus on the underlying application. The paper describes a fault-tolerant task scheduler and its performance analysis. The task scheduler integrates work stealing with an advanced form of eager scheduling. It enables dynamic task decomposition, which improves host load-balancing in the presence of tasks whose non-uniform computational load is evident only at execution time. Speedup measurements are presented of actual performance on up to 1000 hosts. We analyze the expected performance degradation due to unresponsive hosts, and measure actual performance degradation due to unresponsive hosts. Copyright © 2005 John Wiley & Sons, Ltd. [source]


User transparency: a fully sequential programming model for efficient data parallel image processing

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 6 2004
F. J. Seinstra
Abstract Although many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high-performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing researcher's frame of reference. As it is unrealistic to expect imaging researchers to become experts in parallel computing, tools must be provided to allow them to develop high-performance applications in a highly familiar manner. In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e. sequential) implementation of data parallel imaging applications for execution on homogeneous distributed memory MIMD-style multicomputers. This paper presents an extensive overview of the design rationale behind the software architecture, and gives an assessment of the architecture's effectiveness in providing significant performance gains. In particular, we describe the implementation and automatic parallelization of three well-known example applications that contain many fundamental imaging operations: (1) template matching; (2) multi-baseline stereo vision; and (3) line detection. Based on experimental results we conclude that our software architecture constitutes a powerful and user-friendly tool for obtaining high performance in many important image processing research areas. Copyright © 2004 John Wiley & Sons, Ltd. [source]


OpenMP-oriented applications for distributed shared memory architectures

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 4 2004
Ami Marowka
Abstract The rapid rise of OpenMP as the preferred parallel programming paradigm for small-to-medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model-of-choice for large scale high-performance parallel computing in the coming decade. The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc-NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called copy-inside,copy-back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP-SPMD style of programming that makes the development process of an OpenMP application more structured and simpler. The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse-grained parallelism, although a fast copy mechanism is essential. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Software framework for distributed experimental,computational simulation of structural systems

EARTHQUAKE ENGINEERING AND STRUCTURAL DYNAMICS, Issue 3 2006
Yoshikazu Takahashi
Abstract Supported by the recent advancement of experimental test methods, numerical simulation, and high-speed communication networks, it is possible to distribute geographically the testing of structural systems using hybrid experimental,computational simulation. One of the barriers for this advanced testing is the lack of flexible software for hybrid simulation using heterogeneous experimental equipment. To address this need, an object-oriented software framework is designed, developed, implemented, and demonstrated for distributed experimental,computational simulation of structural systems. The software computes the imposed displacements for a range of test methods and co-ordinates the control of local and distributed configurations of experimental equipment. The object-oriented design of the software promotes the sharing of modules for experimental equipment, test set-ups, simulation models, and test methods. The communication model for distributed hybrid testing is similar to that used for parallel computing to solve structural simulation problems. As a demonstration, a distributed pseudodynamic test was conducted using a client,server approach, in which the server program controlled the test equipment in Japan and the client program performed the computational simulation in the United States. The distributed hybrid simulation showed that the software framework is flexible and reliable. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Evaluating recursive filters on distributed memory parallel computers

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 11 2006
Przemys, aw Stpiczy, skiArticle first published online: 6 APR 200
Abstract The aim of this paper is to show that the recently developed high performance divide and conquer algorithm for solving linear recurrence systems with constant coefficients together with the new BLAS-based algorithm for narrow-banded triangular Toeplitz matrix,vector multiplication, allow to evaluate linear recursive filters efficiently on distributed memory parallel computers. We apply the BSP model of parallel computing to predict the behaviour of the algorithm and to find the optimal values of the method's parameters. The results of experiments performed on a cluster of twelve dual-processor Itanium 2 computers and Cray X1 are also presented and discussed. The algorithm allows to utilize up to 30% of the peak performance of 24 Itanium processors, while a simple scalar algorithm can only utilize about 4% of the peak performance of a single processor. Copyright © 2006 John Wiley & Sons, Ltd. [source]


GPU-accelerated boundary element method for Helmholtz' equation in three dimensions

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 10 2009
Toru Takahashi
Abstract Recently, the application of graphics processing units (GPUs) to scientific computations is attracting a great deal of attention, because GPUs are getting faster and more programmable. In particular, NVIDIA's GPUs called compute unified device architecture enable highly mutlithreaded parallel computing for non-graphic applications. This paper proposes a novel way to accelerate the boundary element method (BEM) for three-dimensional Helmholtz' equation using CUDA. Adopting the techniques for the data caching and the double,single precision floating-point arithmetic, we implemented a GPU-accelerated BEM program for GeForce 8-series GPUs. The program performed 6,23 times faster than a normal BEM program, which was optimized for an Intel's quad-core CPU, for a series of boundary value problems with 8000,128000 unknowns, and it sustained a performance of 167,Gflop/s for the largest problem (1 058 000 unknowns). The accuracy of our BEM program was almost the same as that of the regular BEM program using the double precision floating-point arithmetic. In addition, our BEM was applicable to solve realistic problems. In conclusion, the present GPU-accelerated BEM works rapidly and precisely for solving large-scale boundary value problems for Helmholtz' equation. Copyright © 2009 John Wiley & Sons, Ltd. [source]


A parallel Galerkin boundary element method for surface radiation and mixed heat transfer calculations in complex 3-D geometries

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 12 2004
X. Cui
Abstract This paper presents a parallel Galerkin boundary element method for the solution of surface radiation exchange problems and its coupling with the finite element method for mixed mode heat transfer computations in general 3-D geometries. The computational algorithm for surface radiation calculations is enhanced with the implementation of ideas used for 3-D computer graphics applications and with data structure management involving creating and updating various element lists optimized for numerical performance. The algorithm for detecting the internal third party blockages of thermal rays is presented, which involves a four-step procedure, i.e. the primary clip, secondary clip and adaptive integration with checking. Case studies of surface radiation and mixed heat transfer in both simple and complex 3-D geometric configurations are presented. It is found that a majority of computational time is spent on the detection of foreign element blockages and parallel computing is ideally suited for surface radiation calculations. Results show that the decrease of the CPU time approaches asymptotically to an inverse rule for parallel computing of surface radiation exchanges. For large-scale computations involving complex 3-D geometries, an iterative procedure is a preferred approach for the coupling of the Galerkin boundary and finite elements for mixed mode heat transfer calculations. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Parallel computing of high-speed compressible flows using a node-based finite-element method

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 3 2003
T. Fujisawa
Abstract An efficient parallel computing method for high-speed compressible flows is presented. The numerical analysis of flows with shocks requires very fine computational grids and grid generation requires a great deal of time. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed seamlessly in parallel in terms of nodes. Local finite-element mesh is generated robustly around each node, even for severe boundary shapes such as cracks. The algorithm and the data structure of finite-element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. The inter-processor communication is minimized by renumbering the nodal identification number using ParMETIS. The numerical scheme for high-speed compressible flows is based on the two-step Taylor,Galerkin method. The proposed method is implemented on distributed memory systems, such as an Alpha PC cluster, and a parallel supercomputer, Hitachi SR8000. The performance of the method is illustrated by the computation of supersonic flows over a forward facing step. The numerical examples show that crisp shocks are effectively computed on multiprocessors at high efficiency. Copyright © 2003 John Wiley & Sons, Ltd. [source]


A mesh patching method for finite volume modelling of shallow water flow

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 12 2006
Keming Hu
Abstract A new mesh-patching model is presented for shallow water flow described by the 2D non-linear shallow water (NLSW) equations. The mesh-patching model is based on AMAZON, a high-resolution NLSW engine with an improved HLLC approximate Riemann solver. A new patching algorithm has been developed, which not only provides improved spatial resolution of flow features in particular parts of the mesh, but also simplifies and speeds up the (structured) grid generation process for an area with complicated geometry. The new patching technique is also compatible with increasingly popular parallel computing and adaptive grid techniques. The patching algorithm has been tested with moving bores, and results of test problems are presented and compared to previous work. Copyright © 2005 John Wiley & Sons, Ltd. [source]