Home About us Contact

Parallel Computers (parallel + computer)

Distribution by Scientific Domains

Engineering	48%
Earth and Environmental Science	10%

Selected Abstracts

Special Issue: 10th International Workshop on Compilers for Parallel Computers (CPC 2003)

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 11 2006
Peter M. W. Knijnenburg
No abstract is available for this article. [source]

An MPI Parallel Implementation of Newmark's Method

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 3 2000
Ali Namazifard
The standard message-passing interface (MPI) is used to parallelize Newmark's method. The linear matrix equation encountered at each time step is solved using a preconditioned conjugate gradient algorithm. Data are distributed over the processors of a given parallel computer on a degree-of-freedom basis; this produces effective load balance between the processors and leads to a highly parallelized code. The portability of the implementation of this scheme is tested by solving some simple problems on two different machines: an SGI Origin2000 and an IBM SP2. The measured times demonstrate the efficiency of the approach and highlight the maintenance advantages that arise from using a standard parallel library such as MPI. [source]

Micro-mechanical simulation of geotechnical problems using massively parallel computers

INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 14 2003
David W. Washington
Abstract This paper demonstrates that the architecture of a massively parallel computer can be adapted for micro-mechanical simulations of a Geotechnical problem. The Discrete Element Method was used on a massively parallel supercomputer to simulate Geotechnical boundary value problems. For the demonstration, a triaxial test was simulated using an algorithm titled ,TRUBAL for Parallel Machines (TPM)' based on the discrete element method (DEM). In this trial demonstration, the inherent parallelism within DEM algorithm is shown. Then a comparison is made between the parallel algorithm (TPM) and the serial algorithm (TRUBAL) to show the benefits of this research. TPM showed substantial improvement in performance with increasing number of processors when compared with TRUBAL using single processor. Copyright © 2003 John Wiley & Sons, Ltd. [source]

Bifurcation and stability analysis of laminar flow in curved ducts

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 4 2010
Werner Machane
Abstract The development of viscous flow in a curved duct under variation of the axial pressure gradient q is studied. We confine ourselves to two-dimensional solutions of the Dean problem. Bifurcation diagrams are calculated for rectangular and elliptic cross sections of the duct. We detect a new branch of asymmetric solutions for the case of a rectangular cross section. Furthermore we compute paths of quadratic turning points and symmetry breaking bifurcation points under variation of the aspect ratio , (,=0.8,1.5). The computed diagrams extend the results presented by other authors. We succeed in finding two origins of the Hopf bifurcation. Making use of the Cayley transformation, we determine the stability of stationary laminar solutions in the case of a quadratic cross section. All the calculations were performed on a parallel computer with 32×32 processors. Copyright © 2009 John Wiley & Sons, Ltd. [source]

A batch-type time-true ATM-network simulator,design for parallel processing

INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 8 2002
Michael Logothetis
Abstract This paper presents a new type of network simulator for simulating the call-level operations of telecom networks and especially ATM networks. The simulator is a pure time-true type as opposed to a call-by-call type simulator. It is also characterized as a batch-type simulator. The entire simulation duration is divided into short time intervals of equal duration, t. During t, a batch processing of call origination or termination events is executed and the time-points of these events are sorted. The number of sorting executions is drastically reduced compared to a call-by-call simulator, resulting in considerable timesaving. The proposed data structures of the simulator can be implemented by a general-purpose programming language and are well fitted to parallel processing techniques for implementation on parallel computers, for further savings of execution time. We have first implemented the simulator in a sequential computer and then we have applied parallelization techniques to achieve its implementation on a parallel computer. In order to simplify the parallelization procedure, we dissociate the core simulation from the built-in call-level functions (e.g. bandwidth control or dynamic routing) of the network. The key point for a parallel implementation is to organize data by virtual paths (VPs) and distribute them among processors, which all execute the same set of instructions on this data. The performance of the proposed batch-type, time-true, ATM-network simulator is compared with that of a call-by-call simulator to reveal its superiority in terms of sequential execution time (when both simulators run on conventional computers). Finally, a measure of the accuracy of the simulation results is given. Copyright © 2002 John Wiley & Sons, Ltd. [source]

Block s-step Krylov iterative methods

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 1 2010
Anthony T. Chronopoulos
Abstract Block (including s-step) iterative methods for (non)symmetric linear systems have been studied and implemented in the past. In this article we present a (combined) block s-step Krylov iterative method for nonsymmetric linear systems. We then consider the problem of applying any block iterative method to solve a linear system with one right-hand side using many linearly independent initial residual vectors. We present a new algorithm which combines the many solutions obtained (by any block iterative method) into a single solution to the linear system. This approach of using block methods in order to increase the parallelism of Krylov methods is very useful in parallel systems. We implemented the new method on a parallel computer and we ran tests to validate the accuracy and the performance of the proposed methods. It is expected that the block s-step methods performance will scale well on other parallel systems because of their efficient use of memory hierarchies and their reduction of the number of global communication operations over the standard methods. Copyright © 2009 John Wiley & Sons, Ltd. [source]

A cache-efficient implementation of the lattice Boltzmann method for the two-dimensional diffusion equation

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 14 2004
A. C. Velivelli
Abstract The lattice Boltzmann method is an important technique for the numerical solution of partial differential equations because it has nearly ideal scalability on parallel computers for many applications. However, to achieve the scalability and speed potential of the lattice Boltzmann technique, the issues of data reusability in cache-based computer architectures must be addressed. Utilizing the two-dimensional diffusion equation, , this paper examines cache optimization for the lattice Boltzmann method in both serial and parallel implementations. In this study, speedups due to cache optimization were found to be 1.9,2.5 for the serial implementation and 3.6,3.8 for the parallel case in which the domain decomposition was optimized for stride-one access. In the parallel non-cached implementation, the method of domain decomposition (horizontal or vertical) used for parallelization did not significantly affect the compute time. In contrast, the cache-based implementation of the lattice Boltzmann method was significantly faster when the domain decomposition was optimized for stride-one access. Additionally, the cache-optimized lattice Boltzmann method in which the domain decomposition was optimized for stride-one access displayed superlinear scalability on all problem sizes as the number of processors was increased. Copyright © 2004 John Wiley & Sons, Ltd. [source]

Deadlock detection in MPI programs

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 11 2002
Glenn R. Luecke
Abstract The Message-Passing Interface (MPI) is commonly used to write parallel programs for distributed memory parallel computers. MPI-CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. This paper presents the methods used in MPI-CHECK 2.0 to detect many situations where actual and potential deadlocks occur when using blocking and non-blocking point-to-point routines as well as when using collective routines. Copyright © 2002 John Wiley & Sons, Ltd. [source]

Solving, Estimating, and Selecting Nonlinear Dynamic Models Without the Curse of Dimensionality

ECONOMETRICA, Issue 2 2010
Viktor Winschel
We present a comprehensive framework for Bayesian estimation of structural nonlinear dynamic economic models on sparse grids to overcome the curse of dimensionality for approximations. We apply sparse grids to a global polynomial approximation of the model solution, to the quadrature of integrals arising as rational expectations, and to three new nonlinear state space filters which speed up the sequential importance resampling particle filter. The posterior of the structural parameters is estimated by a new Metropolis,Hastings algorithm with mixing parallel sequences. The parallel extension improves the global maximization property of the algorithm, simplifies the parameterization for an appropriate acceptance ratio, and allows a simple implementation of the estimation on parallel computers. Finally, we provide all algorithms in the open source software JBendge for the solution and estimation of a general class of models. [source]

A parallel multigrid solver for high-frequency electromagnetic field analyses with small-scale PC cluster

ELECTRONICS & COMMUNICATIONS IN JAPAN, Issue 9 2008
Kuniaki Yosui
Abstract Finite element analyses of electromagnetic fields are commonly used for designing various electronic devices. The scale of the analyses becomes larger and larger, therefore, a fast linear solver is needed to solve linear equations arising from the finite element method. Since a multigrid solver is the fastest linear solver for these problems, parallelization of a multigrid solver is quite a useful approach. From the viewpoint of industrial applications, an effective usage of a small-scale PC cluster is important due to initial cost for introducing parallel computers. In this paper, a distributed parallel multigrid solver for a small-scale PC cluster is developed. In high-frequency electromagnetic analyses, a special block Gauss, Seidel smoother is used for the multigrid solver instead of general smoothers such as a Gauss, Seidel or Jacobi smoother in order to improve the convergence rate. The block multicolor ordering technique is applied to parallelize the smoother. A numerical example shows that a 3.7-fold speed-up in computational time and a 3.0-fold increase in the scale of the analysis were attained when the number of CPUs was increased from one to five. © 2009 Wiley Periodicals, Inc. Electron Comm Jpn, 91(9): 28, 36, 2008; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecj.10160 [source]

Spectral-element simulations of wave propagation in porous media

GEOPHYSICAL JOURNAL INTERNATIONAL, Issue 1 2008
Christina Morency
SUMMARY We present a derivation of the equations describing wave propagation in porous media based upon an averaging technique which accommodates the transition from the microscopic to the macroscopic scale. We demonstrate that the governing macroscopic equations determined by Biot remain valid for media with gradients in porosity. In such media, the well-known expression for the change in porosity, or the change in the fluid content of the pores, acquires two extra terms involving the porosity gradient. One fundamental result of Biot's theory is the prediction of a second compressional wave, often referred to as ,type II' or ,Biot's slow compressional wave', in addition to the classical fast compressional and shear waves. We present a numerical implementation of the Biot equations for 2-D problems based upon the spectral-element method (SEM) that clearly illustrates the existence of these three types of waves as well as their interactions at discontinuities. As in the elastic and acoustic cases, poroelastic wave propagation based upon the SEM involves a diagonal mass matrix, which leads to explicit time integration schemes that are well suited to simulations on parallel computers. Effects associated with physical dispersion and attenuation and frequency-dependent viscous resistance are accommodated based upon a memory variable approach. We perform various benchmarks involving poroelastic wave propagation and acoustic,poroelastic and poroelastic,poroelastic discontinuities, and we discuss the boundary conditions used to deal with these discontinuities based upon domain decomposition. We show potential applications of the method related to wave propagation in compacted sediments, as one encounters in the petroleum industry, and to detect the seismic signature of buried landmines and unexploded ordnance. [source]

Spectral-element simulations of global seismic wave propagation,II.

GEOPHYSICAL JOURNAL INTERNATIONAL, Issue 1 2002
Three-dimensional models, oceans, rotation, self-gravitation
Summary We simulate global seismic wave propagation based upon a spectral-element method. We include the full complexity of 3-D Earth models, i.e. lateral variations in compressional-wave velocity, shear-wave velocity and density, a 3-D crustal model, ellipticity, as well as topography and bathymetry. We also include the effects of the oceans, rotation and self-gravitation in the context of the Cowling approximation. For the oceans we introduce a formulation based upon an equivalent load in which the oceans do not need to be meshed explicitly. Some of these effects, which are often considered negligible in global seismology, can in fact play a significant role for certain source,receiver configurations. Anisotropy and attenuation, which were introduced and validated in a previous paper, are also incorporated in this study. The complex phenomena that are taken into account are introduced in such a way that we preserve the main advantages of the spectral-element method, which are an exactly diagonal mass matrix and very high computational efficiency on parallel computers. For self-gravitation and the oceans we benchmark spectral-element synthetic seismograms against normal-mode synthetics for the spherically symmetric reference model PREM. The two methods are in excellent agreement for all body- and surface-wave arrivals with periods greater than about 20 s in the case of self-gravitation and 25 s in the case of the oceans. At long periods the effect of gravity on multiorbit surface waves up to R4 is correctly reproduced. We subsequently present results of simulations for two real earthquakes in fully 3-D Earth models for which the fit to the data is significantly improved compared with classical normal-mode calculations based upon PREM. For example, we show that for trans-Pacific paths the Rayleigh wave can arrive more than a minute earlier than in PREM, and that the Love wave is much shorter in duration. [source]

ParCYCLIC: finite element modelling of earthquake liquefaction response on parallel computers

INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 12 2004
Jun Peng
Abstract This paper presents the computational procedures and solution strategy employed in ParCYCLIC, a parallel non-linear finite element program developed based on an existing serial code CYCLIC for the analysis of cyclic seismically-induced liquefaction problems. In ParCYCLIC, finite elements are employed within an incremental plasticity, coupled solid,fluid formulation. A constitutive model developed for simulating liquefaction-induced deformations is a main component of this analysis framework. The elements of the computational strategy, designed for distributed-memory message-passing parallel computer systems, include: (a) an automatic domain decomposer to partition the finite element mesh; (b) nodal ordering strategies to minimize storage space for the matrix coefficients; (c) an efficient scheme for the allocation of sparse matrix coefficients among the processors; and (d) a parallel sparse direct solver. Application of ParCYCLIC to simulate 3-D geotechnical experimental models is demonstrated. The computational results show excellent parallel performance and scalability of ParCYCLIC on parallel computers with a large number of processors. Copyright © 2004 John Wiley & Sons, Ltd. [source]

Micro-mechanical simulation of geotechnical problems using massively parallel computers

Evaluating recursive filters on distributed memory parallel computers

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 11 2006
Przemys, aw Stpiczy, skiArticle first published online: 6 APR 200
Abstract The aim of this paper is to show that the recently developed high performance divide and conquer algorithm for solving linear recurrence systems with constant coefficients together with the new BLAS-based algorithm for narrow-banded triangular Toeplitz matrix,vector multiplication, allow to evaluate linear recursive filters efficiently on distributed memory parallel computers. We apply the BSP model of parallel computing to predict the behaviour of the algorithm and to find the optimal values of the method's parameters. The results of experiments performed on a cluster of twelve dual-processor Itanium 2 computers and Cray X1 are also presented and discussed. The algorithm allows to utilize up to 30% of the peak performance of 24 Itanium processors, while a simple scalar algorithm can only utilize about 4% of the peak performance of a single processor. Copyright © 2006 John Wiley & Sons, Ltd. [source]

Application of the additive Schwarz method to large scale Poisson problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 3 2004
K. M. Singh
Abstract This paper presents an application of the additive Schwarz method to large scale Poisson problems on parallel computers. Domain decomposition in rectangular blocks with matching grids on a structured rectangular mesh has been used together with a stepwise approximation to approximate sloping sides and complicated geometric features. A seven-point stencil based on central difference scheme has been used for the discretization of the Laplacian for both interior and boundary grid points, and this results in a symmetric linear algebraic system for any type of boundary conditions. The preconditioned conjugate gradient method has been used as an accelerator for the additive Schwarz method, and three different methods have been assessed for the solution of subdomain problems. Numerical experiments have been performed to determine the most suitable set of subdomain solvers and the optimal accuracy of subdomain solutions; to assess the effect of different decompositions of the problem domain; and to evaluate the parallel performance of the additive Schwarz preconditioner. Application to a practical problem involving complicated geometry is presented which establishes the efficiency and robustness of the method. Copyright © 2004 John Wiley & Sons, Ltd. [source]

A distributed memory parallel implementation of the multigrid method for solving three-dimensional implicit solid mechanics problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 8 2004
A. Namazifard
Abstract We describe the parallel implementation of a multigrid method for unstructured finite element discretizations of solid mechanics problems. We focus on a distributed memory programming model and use the MPI library to perform the required interprocessor communications. We present an algebraic framework for our parallel computations, and describe an object-based programming methodology using Fortran90. The performance of the implementation is measured by solving both fixed- and scaled-size problems on three different parallel computers (an SGI Origin2000, an IBM SP2 and a Cray T3E). The code performs well in terms of speedup, parallel efficiency and scalability. However, the floating point performance is considerably below the peak values attributed to these machines. Lazy processors are documented on the Origin that produce reduced performance statistics. The solution of two problems on an SGI Origin2000, an IBM PowerPC SMP and a Linux cluster demonstrate that the algorithm performs well when applied to the unstructured meshes required for practical engineering analysis. Copyright © 2004 John Wiley & Sons, Ltd. [source]

Non-linear additive Schwarz preconditioners and application in computational fluid dynamics

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 12 2002
Xiao-Chuan Cai
Abstract The focus of this paper is on the numerical solution of large sparse non-linear systems of algebraic equations on parallel computers. Such non-linear systems often arise from the discretization of non-linear partial differential equations, such as the Navier,Stokes equations for fluid flows, using finite element or finite difference methods. A traditional inexact Newton method, applied directly to the discretized system, does not work well when the non-linearities in the algebraic system become unbalanced. In this paper, we study some preconditioned inexact Newton algorithms, including the single-level and multilevel non-linear additive Schwarz preconditioners. Some results for solving the high Reynolds number incompressible Navier,Stokes equations are reported. Copyright © 2002 John Wiley & Sons, Ltd. [source]

A batch-type time-true ATM-network simulator,design for parallel processing

DIESEL-MP2: A new program to perform large-scale multireference-MP2 computations,

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 10 2006
Patrick Musch
Abstract This article presents a new MR-MP2 code (Multi- Reference Møller,Plesset 2nd order) suitable for the computation MR-MP2 energies of extended systems with strong near degeneracy effects (e.g., open shell systems). It is based on the DIESEL program package developed by Hanrath and Engels. Due to improved algorithms the new code is able to handle systems with 400,500 basis functions and more than 100 electrons. The code is made for parallel computers with distributed memory, but can also be run on local machines. It possesses two integral interfaces (MOLCAS, TURBOMOLE). The algorithms are briefly introduced and timings for the Neocarzinostatin chromophore are presented. The efficiencies of the codes obtained with Intel or GNU compilers are compared. © 2006 Wiley Periodicals, Inc. J Comput Chem 27: 1055,1062, 2006 [source]

Distance-two interpolation for parallel algebraic multigrid

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 2-3 2008
Hans De Sterck
Abstract Algebraic multigrid (AMG) is one of the most efficient and scalable parallel algorithms for solving sparse linear systems on unstructured grids. However, for large 3D problems, the coarse grids that are normally used in AMG often lead to growing complexity in terms of memory use and execution time per AMG V-cycle. Sparser coarse grids, such as those obtained by the parallel modified independent set (PMIS) coarsening algorithm, remedy this complexity growth but lead to nonscalable AMG convergence factors when traditional distance-one interpolation methods are used. In this paper, we study the scalability of AMG methods that combine PMIS coarse grids with long-distance interpolation methods. AMG performance and scalability are compared for previously introduced interpolation methods as well as new variants of them for a variety of relevant test problems on parallel computers. It is shown that the increased interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers. Copyright © 2007 John Wiley & Sons, Ltd. [source]

A direct Schur,Fourier decomposition for the efficient solution of high-order Poisson equations on loosely coupled parallel computers

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 4 2006
F. X. Trias
Abstract In this paper a parallel direct Schur,Fourier decomposition (DSFD) algorithm for the direct solution of arbitrary order discrete Poisson equations on parallel computers is proposed. It is based on a combination of a Direct Schur method and a Fourier decomposition and allows to solve each Poisson equation almost to machine accuracy using only one communication episode. Thus, it is well suited for loosely coupled parallel computers, that have a high network latency compared with the CPU performance. Several three-dimensional direct numerical simulations (DNS) of wall-bounded turbulent incompressible flows have been carried out using the DSFD algorithm. Numerical examples illustrating the robustness and scalability of the method on a PC cluster with a conventional 100 Mbits/s network are also presented. Copyright © 2005 John Wiley & Sons, Ltd. [source]

Some observations on the l2 convergence of the additive Schwarz preconditioned GMRES method

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 5 2002
Xiao-Chuan Cai
Abstract Additive Schwarz preconditioned GMRES is a powerful method for solving large sparse linear systems of equations on parallel computers. The algorithm is often implemented in the Euclidean norm, or the discrete l2 norm, however, the optimal convergence result is available only in the energy norm (or the equivalent Sobolev H1 norm). Very little progress has been made in the theoretical understanding of the l2 behaviour of this very successful algorithm. To add to the difficulty in developing a full l2 theory, in this note, we construct explicit examples and show that the optimal convergence of additive Schwarz preconditioned GMRES in l2 cannot be obtained using the existing GMRES theory. More precisely speaking, we show that the symmetric part of the preconditioned matrix, which plays a role in the Eisenstat,Elman,Schultz theory, has at least one negative eigenvalue, and we show that the condition number of the best possible eigenmatrix that diagonalizes the preconditioned matrix, key to the Saad,Schultz theory, is bounded from both above and below by constants multiplied by h,1/2. Here h is the finite element mesh size. The results presented in this paper are mostly negative, but we believe that the techniques used in our proofs may have wide applications in the further development of the l2 convergence theory and in other areas of domain decomposition methods. Copyright © 2002 John Wiley & Sons, Ltd. [source]

Fibonacci grids: A novel approach to global modelling

THE QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, Issue 619 2006
Richard Swinbank
Abstract Recent years have seen a resurgence of interest in a variety of non-standard computational grids for global numerical prediction. The motivation has been to reduce problems associated with the converging meridians and the polar singularities of conventional regular latitude,longitude grids. A further impetus has come from the adoption of massively parallel computers, for which it is necessary to distribute work equitably across the processors; this is more practicable for some non-standard grids. Desirable attributes of a grid for high-order spatial finite differencing are: (i) geometrical regularity; (ii) a homogeneous and approximately isotropic spatial resolution; (iii) a low proportion of the grid points where the numerical procedures require special customization (such as near coordinate singularities or grid edges); (iv) ease of parallelization. One family of grid arrangements which, to our knowledge, has never before been applied to numerical weather prediction, but which appears to offer several technical advantages, are what we shall refer to as ,Fibonacci grids'. These grids possess virtually uniform and isotropic resolution, with an equal area for each grid point. There are only two compact singular regions on a sphere that require customized numerics. We demonstrate the practicality of this type of grid in shallow-water simulations, and discuss the prospects for efficiently using these frameworks in three-dimensional weather prediction or climate models. © Crown copyright, 2006. Royal Meteorological Society [source]