Home About us Contact | |||
Parallel Implementation (parallel + implementation)
Selected AbstractsAn MPI Parallel Implementation of Newmark's MethodCOMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 3 2000Ali Namazifard The standard message-passing interface (MPI) is used to parallelize Newmark's method. The linear matrix equation encountered at each time step is solved using a preconditioned conjugate gradient algorithm. Data are distributed over the processors of a given parallel computer on a degree-of-freedom basis; this produces effective load balance between the processors and leads to a highly parallelized code. The portability of the implementation of this scheme is tested by solving some simple problems on two different machines: an SGI Origin2000 and an IBM SP2. The measured times demonstrate the efficiency of the approach and highlight the maintenance advantages that arise from using a standard parallel library such as MPI. [source] PC cluster parallel finite element analysis of sloshing problem by earthquake using different network environmentsINTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 10 2002Kazuo Kashiyama Abstract This paper presents a parallel finite element method for the analysis of the sloshing problem caused by earthquakes. The incompressible Navier,Stokes equation based on Arbitrary Lagrangian,Eulerian description is used as the governing equation. The SUPG/PSPG formulation is employed to improve the numerical stability and the accuracy. Parallel implementation of the unstructured grid based formulation was carried out on a PC cluster. The present method was applied to analyse the sloshing problem of a rectangular tank and an actual reservoir. The effect of parallelization on the efficiency of the computations was examined using a number of different network environments. Copyright © 2002 John Wiley & Sons, Ltd. [source] The modelling of multi-fracturing solids and particulate mediaINTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 1 2004D. R. J. Owen Abstract Computational strategies in the context of combined discrete/finite element methods for effective modelling of large-scale practical problems involving multiple fracture and discrete phenomena are reviewed in the present work. The issues considered include: (1) Fracture criteria and propagation mechanisms within both the finite and discrete elements, together with mesh adaptivity procedures for discretization and introduction of fracture systems; (2) Detection procedures for monitoring contact between large numbers of discrete elements; (3) Interaction laws governing the response of contact pairs; (4) Parallel implementation; (5) Other issues, such as element methodology for near incompressible behaviour and generation of random packing of discrete objects. The applicability of the methodology developed is illustrated through selected practical examples. Copyright © 2004 John Wiley & Sons, Ltd. [source] Parallel implementation of AutoDockJOURNAL OF APPLIED CRYSTALLOGRAPHY, Issue 3 2007Prashant Khodade Computational docking of ligands to protein structures is a key step in structure-based drug design. Currently, the time required for each docking run is high and thus limits the use of docking in a high-throughput manner, warranting parallelization of docking algorithms. AutoDock, a widely used tool, has been chosen for parallelization. Near-linear increases in speed were observed with 96 processors, reducing the time required for docking ligands to HIV-protease from 81,min, as an example, on a single IBM Power-5 processor (1.65,GHz), to about 1,min on an IBM cluster, with 96 such processors. This implementation would make it feasible to perform virtual ligand screening using AutoDock. [source] Out-of-Core and Dynamic Programming for Data Distribution on a Volume Visualization ClusterCOMPUTER GRAPHICS FORUM, Issue 1 2009S. Frank I.3.2 [Computer Graphics]: Distributed/network graphics; C.2.4 [Distributed Systems]: Distributed applications Abstract Ray directed volume-rendering algorithms are well suited for parallel implementation in a distributed cluster environment. For distributed ray casting, the scene must be partitioned between nodes for good load balancing, and a strict view-dependent priority order is required for image composition. In this paper, we define the load balanced network distribution (LBND) problem and map it to the NP-complete precedence constrained job-shop scheduling problem. We introduce a kd-tree solution and a dynamic programming solution. To process a massive data set, either a parallel or an out-of-core approach is required. Parallel preprocessing is performed by render nodes on data, which are allocated using a static data structure. Volumetric data sets often contain a large portion of voxels that will never be rendered, or empty space. Parallel preprocessing fails to take advantage of this. Our slab-projection slice, introduced in this paper, tracks empty space across consecutive slices of data to reduce the amount of data distributed and rendered. It is used to facilitate out-of-core bricking and kd-tree partitioning. Load balancing using each of our approaches is compared with traditional methods using several segmented regions of the Visible Korean data set. [source] Two Parallel Computing Methods for Coupled Thermohydromechanical ProblemsCOMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 3 2000B. A. Schrefler Two different approaches are presented for the parallel implementation of computer programs for the solution of coupled thermohydromechanical problems. One is an asynchronous method in connection with staggered and monolithic solution procedures. The second one is a domain decomposition method making use of substructuring techniques and a Newton-Raphson procedure. The advantages of the proposed methods are illustrated by examples. Both methods are promising, but we actually have no comparison between the two because one works on a linear program with only two interacting fields and the other on a full nonlinear set of (multifield) equations. [source] First experience of compressible gas dynamics simulation on the Los Alamos roadrunner machineCONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 17 2009Paul R. Woodward Abstract We report initial experience with gas dynamics simulation on the Los Alamos Roadrunner machine. In this initial work, we have restricted our attention to flows in which the flow Mach number is less than 2. This permits us to use a simplified version of the PPM gas dynamics algorithm that has been described in detail by Woodward (2006). We follow a multifluid volume fraction using the PPB moment-conserving advection scheme, enforcing both pressure and temperature equilibrium between two monatomic ideal gases within each grid cell. The resulting gas dynamics code has been extensively restructured for efficient multicore processing and implemented for scalable parallel execution on the Roadrunner system. The code restructuring and parallel implementation are described and performance results are discussed. For a modest grid size, sustained performance of 3.89 Gflops,1 CPU-core,1 is delivered by this code on 36 Cell processors in 9 triblade nodes of a single rack of Roadrunner hardware. Copyright © 2009 John Wiley & Sons, Ltd. [source] APEX-Map: a parameterized scalable memory access probe for high-performance computing systems,CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 17 2007Erich Strohmaier Abstract The memory wall between the peak performance of microprocessors and their memory performance has become the prominent performance bottleneck for many scientific application codes. New benchmarks measuring data access speeds locally and globally in a variety of different ways are needed to explore the ever increasing diversity of architectures for high-performance computing. In this paper, we introduce a novel benchmark, APEX-Map, which focuses on global data movement and measures how fast global data can be fed into computational units. APEX-Map is a parameterized, synthetic performance probe and integrates concepts for temporal and spatial locality into its design. Our first parallel implementation in MPI and various results obtained with it are discussed in detail. By measuring the APEX-Map performance with parameter sweeps for a whole range of temporal and spatial localities performance surfaces can be generated. These surfaces are ideally suited to study the characteristics of the computational platforms and are useful for performance comparison. Results on a global-memory vector platform and distributed-memory superscalar platforms clearly reflect the design differences between these different architectures. Published in 2007 by John Wiley & Sons, Ltd. [source] Parallel four-dimensional Haralick texture analysis for disk-resident image datasetsCONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 1 2007Brent Woods Abstract Texture analysis is one possible method of detecting features in biomedical images. During texture analysis, texture-related information is found by examining local variations in image brightness. Four-dimensional (4D) Haralick texture analysis is a method that extracts local variations along space and time dimensions and represents them as a collection of 14 statistical parameters. However, application of the 4D Haralick method on large time-dependent image datasets is hindered by data retrieval, computation, and memory requirements. This paper describes a parallel implementation using a distributed component-based framework of 4D Haralick texture analysis on PC clusters. The experimental performance results show that good performance can be achieved for this application via combined use of task- and data-parallelism. In addition, we show that our 4D texture analysis implementation can be used to classify imaged tissues. Copyright © 2006 John Wiley & Sons, Ltd. [source] Full waveform seismic inversion using a distributed system of computersCONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 11 2005Indrajit G. Roy Abstract The aim of seismic waveform inversion is to estimate the elastic properties of the Earth's subsurface layers from recordings of seismic waveform data. This is usually accomplished by using constrained optimization often based on very simplistic assumptions. Full waveform inversion uses a more accurate wave propagation model but is extremely difficult to use for routine analysis and interpretation. This is because computational difficulties arise due to: (1) strong nonlinearity of the inverse problem; (2) extreme ill-posedness; and (3) large dimensions of data and model spaces. We show that some of these difficulties can be overcome by using: (1) an improved forward problem solver and efficient technique to generate sensitivity matrix; (2) an iteration adaptive regularized truncated Gauss,Newton technique; (3) an efficient technique for matrix,matrix and matrix,vector multiplication; and (4) a parallel programming implementation with a distributed system of processors. We use a message-passing interface in the parallel programming environment. We present inversion results for synthetic and field data, and a performance analysis of our parallel implementation. Copyright © 2005 John Wiley & Sons, Ltd. [source] User transparency: a fully sequential programming model for efficient data parallel image processingCONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 6 2004F. J. Seinstra Abstract Although many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high-performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing researcher's frame of reference. As it is unrealistic to expect imaging researchers to become experts in parallel computing, tools must be provided to allow them to develop high-performance applications in a highly familiar manner. In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e. sequential) implementation of data parallel imaging applications for execution on homogeneous distributed memory MIMD-style multicomputers. This paper presents an extensive overview of the design rationale behind the software architecture, and gives an assessment of the architecture's effectiveness in providing significant performance gains. In particular, we describe the implementation and automatic parallelization of three well-known example applications that contain many fundamental imaging operations: (1) template matching; (2) multi-baseline stereo vision; and (3) line detection. Based on experimental results we conclude that our software architecture constitutes a powerful and user-friendly tool for obtaining high performance in many important image processing research areas. Copyright © 2004 John Wiley & Sons, Ltd. [source] Surgical correction of scoliosis: Numerical analysis and optimization of the procedureINTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 9 2010J. F. Aguilar Madeira Abstract A previously developed model is used to numerically simulate real clinical cases of the surgical correction of scoliosis. This model consists of one-dimensional finite elements with spatial deformation in which (i) the column is represented by its axis; (ii) the vertebrae are assumed to be rigid; and (iii) the deformability of the column is concentrated in springs that connect the successive rigid elements. The metallic rods used for the surgical correction are modeled by beam elements with linear elastic behavior. To obtain the forces at the connections between the metallic rods and the vertebrae geometrically, non-linear finite element analyses are performed. The tightening sequence determines the magnitude of the forces applied to the patient column, and it is desirable to keep those forces as small as possible. In this study, a Genetic Algorithm optimization is applied to this model in order to determine the sequence that minimizes the corrective forces applied during the surgery. This amounts to find the optimal permutation of integers 1, ,, n, n being the number of vertebrae involved. As such, we are faced with a combinatorial optimization problem isomorph to the Traveling Salesman Problem. The fitness evaluation requires one computing intensive Finite Element Analysis per candidate solution and, thus, a parallel implementation of the Genetic Algorithm is developed. Copyright © 2010 John Wiley & Sons, Ltd. [source] A distributed memory parallel implementation of the multigrid method for solving three-dimensional implicit solid mechanics problemsINTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 8 2004A. Namazifard Abstract We describe the parallel implementation of a multigrid method for unstructured finite element discretizations of solid mechanics problems. We focus on a distributed memory programming model and use the MPI library to perform the required interprocessor communications. We present an algebraic framework for our parallel computations, and describe an object-based programming methodology using Fortran90. The performance of the implementation is measured by solving both fixed- and scaled-size problems on three different parallel computers (an SGI Origin2000, an IBM SP2 and a Cray T3E). The code performs well in terms of speedup, parallel efficiency and scalability. However, the floating point performance is considerably below the peak values attributed to these machines. Lazy processors are documented on the Origin that produce reduced performance statistics. The solution of two problems on an SGI Origin2000, an IBM PowerPC SMP and a Linux cluster demonstrate that the algorithm performs well when applied to the unstructured meshes required for practical engineering analysis. Copyright © 2004 John Wiley & Sons, Ltd. [source] Coupled Navier,Stokes,Molecular dynamics simulations using a multi-physics flow simulation frameworkINTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 10 2010R. Steijl Abstract Simulation of nano-scale channel flows using a coupled Navier,Stokes/Molecular Dynamics (MD) method is presented. The flow cases serve as examples of the application of a multi-physics computational framework put forward in this work. The framework employs a set of (partially) overlapping sub-domains in which different levels of physical modelling are used to describe the flow. This way, numerical simulations based on the Navier,Stokes equations can be extended to flows in which the continuum and/or Newtonian flow assumptions break down in regions of the domain, by locally increasing the level of detail in the model. Then, the use of multiple levels of physical modelling can reduce the overall computational cost for a given level of fidelity. The present work describes the structure of a parallel computational framework for such simulations, including details of a Navier,Stokes/MD coupling, the convergence behaviour of coupled simulations as well as the parallel implementation. For the cases considered here, micro-scale MD problems are constructed to provide viscous stresses for the Navier,Stokes equations. The first problem is the planar Poiseuille flow, for which the viscous fluxes on each cell face in the finite-volume discretization are evaluated using MD. The second example deals with fully developed three-dimensional channel flow, with molecular level modelling of the shear stresses in a group of cells in the domain corners. An important aspect in using shear stresses evaluated with MD in Navier,Stokes simulations is the scatter in the data due to the sampling of a finite ensemble over a limited interval. In the coupled simulations, this prevents the convergence of the system in terms of the reduction of the norm of the residual vector of the finite-volume discretization of the macro-domain. Solutions to this problem are discussed in the present work, along with an analysis of the effect of number of realizations and sample duration. The averaging of the apparent viscosity for each cell face, i.e. the ratio of the shear stress predicted from MD and the imposed velocity gradient, over a number of macro-scale time steps is shown to be a simple but effective method to reach a good level of convergence of the coupled system. Finally, the parallel efficiency of the developed method is demonstrated. Copyright © 2009 John Wiley & Sons, Ltd. [source] Multiple semi-coarsened multigrid method with application to large eddy simulationINTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 5 2006F. E. Ham Abstract The Multiple Semi-coarsened Grid (MSG) multigrid method of Mulder (J. Comput. Phys. 1989; 83:303,323) is developed as a solver for fully implicit discretizations of the time-dependent incompressible Navier,Stokes equations. The method is combined with the Symmetric Coupled Gauss,Seidel (SCGS) smoother of Vanka (Comput. Methods Appl. Mech. Eng. 1986; 55:321,338) and its robustness demonstrated by performing a number of large-eddy simulations, including bypass transition on a flat plate and the turbulent thermally-driven cavity flow. The method is consistently able to reduce the non-linear residual by 5 orders of magnitude in 40,80 work units for problems with significant and varying coefficient anisotropy. Some discussion of the parallel implementation of the method is also included. Copyright © 2005 John Wiley & Sons, Ltd. [source] A 2D implicit time-marching algorithm for shallow water models based on the generalized wave continuity equationINTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 3 2004Kendra M. Dresback Abstract This paper builds upon earlier work that developed and evaluated a 1D predictor,corrector time-marching algorithm for wave equation models and extends it to 2D. Typically, the generalized wave continuity equation (GWCE) utilizes a three time-level semi-implicit scheme centred at k, and the momentum equation uses a two time-level scheme centred at k+12. It has been shown that in highly non-linear applications, the algorithm becomes unstable at even moderate Courant numbers. This work implements and analyses an implicit treatment of the non-linear terms through the use of an iterative time-marching algorithm in the two-dimensional framework. Stability results show at least an eight-fold increase in the maximum time step, depending on the domain. Studies also examined the sensitivity of the G parameter (a numerical weighting parameter in the GWCE) with results showing the greatest increase in stability occurs when 1,G/,max,10, a range that coincides with the recommended range to minimize errors. Convergence studies indicate an increase in temporal accuracy from first order to second order, while overall error is less than the original algorithm, even at higher time steps. Finally, a parallel implementation of the new algorithm shows that it scales well. Copyright © 2004 John Wiley & Sons, Ltd. [source] Performance of a parallel implementation of the FMM for electromagnetics applicationsINTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 8 2003G. Sylvand Abstract This paper describes the parallel fast multipole method implemented in EADS integral equations code. We will focus on the electromagnetics applications such as CEM and RCS computation. We solve Maxwell equations in the frequency domain by a finite boundary-element method. The complex dense system of equations obtained cannot be solved using classical methods when the number of unknowns exceeds approximately 105. The use of iterative solvers (such as GMRES) and fast methods (such as the fast multipole method (FMM)) to speed up the matrix,vector product allows us to break this limit. We present the parallel out-of-core implementation of this method developed at CERMICS/INRIA and integrated in EADS industrial software. We were able to solve unprecedented industrial applications containing up to 25 million unknowns. Copyright © 2003 John Wiley & Sons, Ltd. [source] A batch-type time-true ATM-network simulator,design for parallel processingINTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 8 2002Michael Logothetis Abstract This paper presents a new type of network simulator for simulating the call-level operations of telecom networks and especially ATM networks. The simulator is a pure time-true type as opposed to a call-by-call type simulator. It is also characterized as a batch-type simulator. The entire simulation duration is divided into short time intervals of equal duration, t. During t, a batch processing of call origination or termination events is executed and the time-points of these events are sorted. The number of sorting executions is drastically reduced compared to a call-by-call simulator, resulting in considerable timesaving. The proposed data structures of the simulator can be implemented by a general-purpose programming language and are well fitted to parallel processing techniques for implementation on parallel computers, for further savings of execution time. We have first implemented the simulator in a sequential computer and then we have applied parallelization techniques to achieve its implementation on a parallel computer. In order to simplify the parallelization procedure, we dissociate the core simulation from the built-in call-level functions (e.g. bandwidth control or dynamic routing) of the network. The key point for a parallel implementation is to organize data by virtual paths (VPs) and distribute them among processors, which all execute the same set of instructions on this data. The performance of the proposed batch-type, time-true, ATM-network simulator is compared with that of a call-by-call simulator to reveal its superiority in terms of sequential execution time (when both simulators run on conventional computers). Finally, a measure of the accuracy of the simulation results is given. Copyright © 2002 John Wiley & Sons, Ltd. [source] Image reconstructions from two orthogonal projectionsINTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, Issue 2 2003Yuanmei Wang Abstract A vector entropy optimization-based neural network approach is presented to handle image reconstructions from two orthogonal projections. An accurate and parallel reconstruction is attained with this method allowing parallel implementation. This is an attempt to extract the image information from two projections. It is especially meaningful for clinical applications and three-dimensional modeling of the coronary arteries. © 2003 Wiley Periodicals, Inc. Int J Imaging Syst Technol 13, 141,145, 2003; Published online in Wiley Inter-Science (www.interscience.wiley.com). DOI 10.1002/ima.10036 [source] MDLab: A molecular dynamics simulation prototyping environmentJOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 7 2010Trevor Cickovski Abstract Molecular dynamics (MD) simulation involves solving Newton's equations of motion for a system of atoms, by calculating forces and updating atomic positions and velocities over a timestep ,t. Despite the large amount of computing power currently available, the timescale of MD simulations is limited by both the small timestep required for propagation, and the expensive algorithm for computing pairwise forces. These issues are currently addressed through the development of efficient simulation methods, some of which make acceptable approximations and as a result can afford larger timesteps. We present MDLab, a development environment for MD simulations built with Python which facilitates prototyping, testing, and debugging of these methods. MDLab provides constructs which allow the development of propagators, force calculators, and high level sampling protocols that run several instances of molecular dynamics. For computationally demanding sampling protocols which require testing on large biomolecules, MDL includes an interface to the OpenMM libraries of Friedrichs et al. which execute on graphical processing units (GPUs) and achieve considerable speedup over execution on the CPU. As an example of an interesting high level method developed in MDLab, we present a parallel implementation of the On-The-Fly string method of Maragliano and Vanden-Eijnden. MDLab is available at http://mdlab.sourceforge.net. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2010 [source] |