Large Problems (large + problem)

Distribution by Scientific Domains


Selected Abstracts


A fast implementation of the FETI-DP method: FETI-DP-RBS-LNA and applications on large scale problems with localized non-linearities

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 6 2005
Jun Sun
Abstract As parallel and distributed computing gradually becomes the computing standard for large scale problems, the domain decomposition method (DD) has received growing attention since it provides a natural basis for splitting a large problem into many small problems, which can be submitted to individual computing nodes and processed in a parallel fashion. This approach not only provides a method to solve large scale problems that are not solvable on a single computer by using direct sparse solvers but also gives a flexible solution to deal with large scale problems with localized non-linearities. When some parts of the structure are modified, only the corresponding subdomains and the interface equation that connects all the subdomains need to be recomputed. In this paper, the dual,primal finite element tearing and interconnecting method (FETI-DP) is carefully investigated, and a reduced back-substitution (RBS) algorithm is proposed to accelerate the time-consuming preconditioned conjugate gradient (PCG) iterations involved in the interface problems. Linear,non-linear analysis (LNA) is also adopted for large scale problems with localized non-linearities based on subdomain linear,non-linear identification criteria. This combined approach is named as the FETI-DP-RBS-LNA algorithm and demonstrated on the mechanical analyses of a welding problem. Serial CPU costs of this algorithm are measured at each solution stage and compared with that from the IBM Watson direct sparse solver and the FETI-DP method. The results demonstrate the effectiveness of the proposed computational approach for simulating welding problems, which is representative of a large class of three-dimensional large scale problems with localized non-linearities. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Sequence alignment on the Cray MTA-2,

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 9 2004
Shahid H. Bokhari
Abstract Several variants of standard algorithms for DNA sequence alignment have been implemented on the Cray Multithreaded Architecture-2 (MTA-2). We describe the architecture of the MTA-2 and discuss how its hardware and software enable efficient implementation of parallel algorithms with little or no regard for issues of partitioning, mapping or scheduling. We describe how we ported variants of the naive algorithm for exact alignment and the dynamic programming algorithm for approximate alignment to the MTA-2 and provide detailed performance measurements. It is shown that, for the dynamic programming algorithm, the use of the MTA's ,Full/Empty' synchronization bits leads to almost perfect speedup for large problems on one to eight processors. These results illustrate the versatility of the MTA's architecture and demonstrate its potential for providing a high-productivity platform for parallel processing. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Multilevel hybrid spectral element ordering algorithms

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 5 2005
Jennifer A. Scott
Abstract For frontal solvers to perform well on finite-element problems it is essential that the elements are ordered for a small wavefront. Multilevel element ordering algorithms have their origins in the profile reduction algorithm of Sloan but for large problems often give significantly smaller wavefronts. We examine a number of multilevel variants with the aim of finding the best methods to include within a new state-of-the-art frontal solver for finite-element applications that we are currently developing. Numerical experiments are performed using a range of problems arising from real applications and comparisons are made with existing element ordering algorithms. Copyright © 2005 John Wiley & Sons, Ltd. [source]


An efficient out-of-core multifrontal solver for large-scale unsymmetric element problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 7 2009
J. K. Reid
Abstract In many applications where the efficient solution of large sparse linear systems of equations is required, a direct method is frequently the method of choice. Unfortunately, direct methods have a potentially severe limitation: as the problem size grows, the memory needed generally increases rapidly. However, the in-core memory requirements can be limited by storing the matrix and its factors externally, allowing the solver to be used for very large problems. We have designed a new out-of-core package for the large sparse unsymmetric systems that arise from finite-element problems. The code, which is called HSL_MA78, implements a multifrontal algorithm and achieves efficiency through the use of specially designed code for handling the input/output operations and efficient dense linear algebra kernels. These kernels, which are available as a separate package called HSL_MA74, use high-level BLAS to perform the partial factorization of the frontal matrices and offer both threshold partial and rook pivoting. In this paper, we describe the design of HSL_MA78 and explain its user interface and the options it offers. We also describe the algorithms used by HSL_MA74 and illustrate the performance of our new codes using problems from a range of practical applications. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Parallel eigenanalysis of multiaquifer systems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 15 2005
L. Bergamaschi
Abstract Finite element discretizations of flow problems involving multiaquifer systems deliver large, sparse, unstructured matrices, whose partial eigenanalysis is important for both solving the flow problem and analysing its main characteristics. We studied and implemented an effective preconditioning of the Jacobi,Davidson algorithm by FSAI-type preconditioners. We developed efficient parallelization strategies in order to solve very large problems, which could not fit into the storage available to a single processor. We report our results about the solution of multiaquifer flow problems on an SP4 machine and a Linux Cluster. We analyse the sequential and parallel efficiency of our algorithm, also compared with standard packages. Questions regarding the parallel solution of finite element eigenproblems are addressed and discussed. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Parallel DSMC method using dynamic domain decomposition

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 1 2005
J.-S. Wu
Abstract A general parallel direct simulation Monte Carlo method using unstructured mesh is introduced, which incorporates a multi-level graph-partitioning technique to dynamically decompose the computational domain. The current DSMC method is implemented on an unstructured mesh using particle ray-tracing technique, which takes the advantages of the cell connectivity information. In addition, various strategies applying the stop at rise (SAR) (IEEE Trans Comput 1988; 39:1073,1087) scheme is studied to determine how frequent the domain should be re-decomposed. A high-speed, bottom-driven cavity flow, including small, medium and large problems, based on the number of particles and cells, are simulated. Corresponding analysis of parallel performance is reported on IBM-SP2 parallel machine up to 64 processors. Analysis shows that degree of imbalance among processors with dynamic load balancing is about ,,½ of that without dynamic load balancing. Detailed time analysis shows that degree of imbalance levels off very rapidly at a relatively low value with increasing number of processors when applying dynamic load balancing, which makes the large problem size fairly scalable for processors more than 64. In general, optimal frequency of activating SAR scheme decreases with problem size. At the end, the method is applied to compute two two-dimensional hypersonic flows, a three-dimensional hypersonic flow and a three-dimensional near-continuum twin-jet gas flow to demonstrate its superior computational capability and compare with experimental data and previous simulation data wherever available. Copyright © 2005 John Wiley & Sons, Ltd. [source]


A variational multiscale Newton,Schur approach for the incompressible Navier,Stokes equations

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 2 2010
D. Z. Turner
Abstract In the following paper, we present a consistent Newton,Schur (NS) solution approach for variational multiscale formulations of the time-dependent Navier,Stokes equations in three dimensions. The main contributions of this work are a systematic study of the variational multiscale method for three-dimensional problems and an implementation of a consistent formulation suitable for large problems with high nonlinearity, unstructured meshes, and non-symmetric matrices. In addition to the quadratic convergence characteristics of a Newton,Raphson-based scheme, the NS approach increases computational efficiency and parallel scalability by implementing the tangent stiffness matrix in Schur complement form. As a result, more computations are performed at the element level. Using a variational multiscale framework, we construct a two-level approach to stabilizing the incompressible Navier,Stokes equations based on a coarse and fine-scale subproblem. We then derive the Schur complement form of the consistent tangent matrix. We demonstrate the performance of the method for a number of three-dimensional problems for Reynolds number up to 1000 including steady and time-dependent flows. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Defining and optimizing algorithms for neighbouring particle identification in SPH fluid simulations

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 6 2008
G. Viccione
Abstract Lagrangian particle methods such as smoothed particle hydrodynamics (SPH) are very demanding in terms of computing time for large domains. Since the numerical integration of the governing equations is only carried out for each particle on a restricted number of neighbouring ones located inside a cut-off radius rc, a substantial part of the computational burden depends on the actual search procedure; it is therefore vital that efficient methods are adopted for such a search. The cut-off radius is indeed much lower than the typical domain's size; hence, the number of neighbouring particles is only a little fraction of the total number. Straightforward determination of which particles are inside the interaction range requires the computation of all pair-wise distances, a procedure whose computational time would be unpractical or totally impossible for large problems. Two main strategies have been developed in the past in order to reduce the unnecessary computation of distances: the first based on dynamically storing each particle's neighbourhood list (Verlet list) and the second based on a framework of fixed cells. The paper presents the results of a numerical sensitivity study on the efficiency of the two procedures as a function of such parameters as the Verlet size and the cell dimensions. An insight is given into the relative computational burden; a discussion of the relative merits of the different approaches is also given and some suggestions are provided on the computational and data structure of the neighbourhood search part of SPH codes. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Numerical solutions of fully non-linear and highly dispersive Boussinesq equations in two horizontal dimensions

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 3 2004
David R. Fuhrman
Abstract This paper investigates preconditioned iterative techniques for finite difference solutions of a high-order Boussinesq method for modelling water waves in two horizontal dimensions. The Boussinesq method solves simultaneously for all three components of velocity at an arbitrary z -level, removing any practical limitations based on the relative water depth. High-order finite difference approximations are shown to be more efficient than low-order approximations (for a given accuracy), despite the additional overhead. The resultant system of equations requires that a sparse, unsymmetric, and often ill-conditioned matrix be solved at each stage evaluation within a simulation. Various preconditioning strategies are investigated, including full factorizations of the linearized matrix, ILU factorizations, a matrix-free (Fourier space) method, and an approximate Schur complement approach. A detailed comparison of the methods is given for both rotational and irrotational formulations, and the strengths and limitations of each are discussed. Mesh-independent convergence is demonstrated with many of the preconditioners for solutions of the irrotational formulation, and solutions using the Fourier space and approximate Schur complement preconditioners are shown to require an overall computational effort that scales linearly with problem size (for large problems). Calculations on a variable depth problem are also compared to experimental data, highlighting the accuracy of the model. Through combined physical and mathematical insight effective preconditioned iterative solutions are achieved for the full physical application range of the model. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Analysis of algorithms for two-stage flowshops with multi-processor task flexibility

NAVAL RESEARCH LOGISTICS: AN INTERNATIONAL JOURNAL, Issue 1 2004
George L. Vairaktarakis
Abstract In this article we introduce a 2-machine flowshop with processing flexibility. Two processing modes are available for each task: namely, processing by the designated processor, and processing simultaneously by both processors. The objective studied is makespan minimization. This production environment is encountered in repetitive manufacturing shops equipped with processors that have the flexibility to execute orders either individually or in coordination. In the latter case, the product designer exploits processing synergies between two processors so as to execute a particular task much faster than a dedicated processor. This type of flowshop environment is also encountered in labor-intensive assembly lines where products moving downstream can be processed either in the designated assembly stations or by pulling together the work teams of adjacent stations. This scheduling problem requires determining the mode of operation of each task, and the subsequent scheduling that preserves the flowshop constraints. We show that the problem is ordinary NP-complete and obtain an optimal solution using a dynamic programming algorithm with considerable computational requirements for medium and large problems. Then, we present a number of dynamic programming relaxations and analyze their worst-case error performance. Finally, we present a polynomial time heuristic with worst-case error performance comparable to that of the dynamic programming relaxations. © 2003 Wiley Periodicals, Inc. Naval Research Logistics, 2004. [source]


Modifying CLJP to select grid hierarchies with lower operator complexities and better performance

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 2-3 2006
David M. Alber
Abstract Algebraic multigrid (AMG) is an efficient algorithm for solving certain types of large, sparse linear systems. For solving very large problems with AMG it becomes necessary to use parallel algorithms. Coarse grid selection algorithms such as CLJP were created to parallelize the setup phase of AMG. For some problems, such as those discretized on structured meshes, CLJP tends to select coarse grids with more nodes than alternative coarsening algorithms. In this paper, the cause for the selection of too many coarse nodes by CLJP is examined, and a new technique which lowers the operator complexities generated by CLJP is introduced. To validate the new method, the modified CLJP is compared to other coarsening algorithms for large-scale problems. Copyright © 2006 John Wiley & Sons, Ltd. [source]