Problem Size (problem + size)

Distribution by Scientific Domains


Selected Abstracts


Exploring the performance of massively multithreaded architectures

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 5 2010
Shahid Bokhari
Abstract We present a new scheme for evaluating the performance of multithreaded computers and demonstrate its application to the Cray MTA-2 and XMT supercomputers. Our scheme is based on the concept of clock cycles per element, , plotted against both problem size and the number of processors. This scheme clearly shows if an implementation has achieved its asymptotic efficiency and is more general than (but includes) the commonly used speedup metric. It permits the discovery of any imperfections in both the software as well as the hardware, and is expected to permit a unified comparison of many different parallel architectures. Measurements on a number of well-known parallel algorithms, ranging from matrix multiply to quicksort, are presented for the MTA-2 and XMT and highlight some interesting differences between these machines. The performance of sequence alignment using dynamic programming is evaluated on the MTA-2, XMT, IBM x3755 and SGI Altix 350 and provides a useful comparison of the capabilities of the Cray machines with more conventional shared memory architectures. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Coordinated Capacitated Lot-Sizing Problem with Dynamic Demand: A Lagrangian Heuristic

DECISION SCIENCES, Issue 1 2004
E. Powell Robinson Jr.
ABSTRACT Coordinated replenishment problems are common in manufacturing and distribution when a family of items shares a common production line, supplier, or a mode of transportation. In these situations the coordination of shared, and often limited, resources across items is economically attractive. This paper describes a mixed-integer programming formulation and Lagrangian relaxation solution procedure for the single-family coordinated capacitated lot-sizing problem with dynamic demand. The problem extends both the multi-item capacitated dynamic demand lot-sizing problem and the uncapacitated coordinated dynamic demand lot-sizing problem. We provide the results of computational experiments investigating the mathematical properties of the formulation and the performance of the Lagrangian procedures. The results indicate the superiority of the dual-based heuristic over linear programming-based approaches to the problem. The quality of the Lagrangian heuristic solution improved in most instances with increases in problem size. Heuristic solutions averaged 2.52% above optimal. The procedures were applied to an industry test problem yielding a 22.5% reduction in total costs. [source]


The impact of problem size on decision processes: an experimental investigation on very large choice problems with support of decision support systems

EXPERT SYSTEMS, Issue 2 2004
H. Wang
Abstract: Choice problems as a class of decision problems have attracted great attention for the last couple of decades. Among the frameworks and supporting theories used in their study, two have had the greatest impact: bounded rationality and cost,benefit. Both theories could find support from past empirical studies under different conditions or problem environments. In the past studies, problem size has been shown to play an important role in decision-making. As problem size increases, a decision process may be detoured and the decision outcome may be different. In this paper we investigate the impact of problem size on three important aspects of the computer-aided decision process , strategy selection, decision time/effort, and decision quality , through very large choice problems. [source]


Migration velocity analysis and waveform inversion

GEOPHYSICAL PROSPECTING, Issue 6 2008
William W. Symes
ABSTRACT Least-squares inversion of seismic reflection waveform data can reconstruct remarkably detailed models of subsurface structure and take into account essentially any physics of seismic wave propagation that can be modelled. However, the waveform inversion objective has many spurious local minima, hence convergence of descent methods (mandatory because of problem size) to useful Earth models requires accurate initial estimates of long-scale velocity structure. Migration velocity analysis, on the other hand, is capable of correcting substantially erroneous initial estimates of velocity at long scales. Migration velocity analysis is based on prestack depth migration, which is in turn based on linearized acoustic modelling (Born or single-scattering approximation). Two major variants of prestack depth migration, using binning of surface data and Claerbout's survey-sinking concept respectively, are in widespread use. Each type of prestack migration produces an image volume depending on redundant parameters and supplies a condition on the image volume, which expresses consistency between data and velocity model and is hence a basis for velocity analysis. The survey-sinking (depth-oriented) approach to prestack migration is less subject to kinematic artefacts than is the binning-based (surface-oriented) approach. Because kinematic artefacts strongly violate the consistency or semblance conditions, this observation suggests that velocity analysis based on depth-oriented prestack migration may be more appropriate in kinematically complex areas. Appropriate choice of objective (differential semblance) turns either form of migration velocity analysis into an optimization problem, for which Newton-like methods exhibit little tendency to stagnate at nonglobal minima. The extended modelling concept links migration velocity analysis to the apparently unrelated waveform inversion approach to estimation of Earth structure: from this point of view, migration velocity analysis is a solution method for the linearized waveform inversion problem. Extended modelling also provides a basis for a nonlinear generalization of migration velocity analysis. Preliminary numerical evidence suggests a new approach to nonlinear waveform inversion, which may combine the global convergence of velocity analysis with the physical fidelity of model-based data fitting. [source]


Computation of a few smallest eigenvalues of elliptic operators using fast elliptic solvers

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 8 2001
Janne Martikainen
Abstract The computation of a few smallest eigenvalues of generalized algebraic eigenvalue problems is studied. The considered problems are obtained by discretizing self-adjoint second-order elliptic partial differential eigenvalue problems in two- or three-dimensional domains. The standard Lanczos algorithm with the complete orthogonalization is used to compute some eigenvalues of the inverted eigenvalue problem. Under suitable assumptions, the number of Lanczos iterations is shown to be independent of the problem size. The arising linear problems are solved using some standard fast elliptic solver. Numerical experiments demonstrate that the inverted problem is much easier to solve with the Lanczos algorithm that the original problem. In these experiments, the underlying Poisson and elasticity problems are solved using a standard multigrid method. Copyright © 2001 John Wiley & Sons, Ltd. [source]


An efficient out-of-core multifrontal solver for large-scale unsymmetric element problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 7 2009
J. K. Reid
Abstract In many applications where the efficient solution of large sparse linear systems of equations is required, a direct method is frequently the method of choice. Unfortunately, direct methods have a potentially severe limitation: as the problem size grows, the memory needed generally increases rapidly. However, the in-core memory requirements can be limited by storing the matrix and its factors externally, allowing the solver to be used for very large problems. We have designed a new out-of-core package for the large sparse unsymmetric systems that arise from finite-element problems. The code, which is called HSL_MA78, implements a multifrontal algorithm and achieves efficiency through the use of specially designed code for handling the input/output operations and efficient dense linear algebra kernels. These kernels, which are available as a separate package called HSL_MA74, use high-level BLAS to perform the partial factorization of the frontal matrices and offer both threshold partial and rook pivoting. In this paper, we describe the design of HSL_MA78 and explain its user interface and the options it offers. We also describe the algorithms used by HSL_MA74 and illustrate the performance of our new codes using problems from a range of practical applications. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Parallel DSMC method using dynamic domain decomposition

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 1 2005
J.-S. Wu
Abstract A general parallel direct simulation Monte Carlo method using unstructured mesh is introduced, which incorporates a multi-level graph-partitioning technique to dynamically decompose the computational domain. The current DSMC method is implemented on an unstructured mesh using particle ray-tracing technique, which takes the advantages of the cell connectivity information. In addition, various strategies applying the stop at rise (SAR) (IEEE Trans Comput 1988; 39:1073,1087) scheme is studied to determine how frequent the domain should be re-decomposed. A high-speed, bottom-driven cavity flow, including small, medium and large problems, based on the number of particles and cells, are simulated. Corresponding analysis of parallel performance is reported on IBM-SP2 parallel machine up to 64 processors. Analysis shows that degree of imbalance among processors with dynamic load balancing is about ,,˝ of that without dynamic load balancing. Detailed time analysis shows that degree of imbalance levels off very rapidly at a relatively low value with increasing number of processors when applying dynamic load balancing, which makes the large problem size fairly scalable for processors more than 64. In general, optimal frequency of activating SAR scheme decreases with problem size. At the end, the method is applied to compute two two-dimensional hypersonic flows, a three-dimensional hypersonic flow and a three-dimensional near-continuum twin-jet gas flow to demonstrate its superior computational capability and compare with experimental data and previous simulation data wherever available. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Efficient implicit finite element analysis of sheet forming processes

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 8 2003
A. H. van den Boogaard
Abstract The computation time for implicit finite element analyses tends to increase disproportionally with increasing problem size. This is due to the repeated solution of linear sets of equations, if direct solvers are used. By using iterative linear equation solvers the total analysis time can be reduced for large systems. For plate or shell element models, however, the condition of the matrix is so ill that iterative solvers do not reach the huge time-savings that are realized with solid elements. By introducing inertial effects into the implicit finite element code the condition number can be improved and iterative solvers perform much better. An additional advantage is that the inertial effects stabilize the Newton,Raphson iterations. This also applies to quasi-static processes, for which the inertial effects finally do not affect the results. The presented method can readily be implemented in existing implicit finite element codes. Industrial size deep drawing simulations are executed to investigate the performance of the recommended strategy. It is concluded that the computation time is decreased by a factor of 5 to 10. Copyright © 2003 John Wiley & Sons, Ltd. [source]


A general non-linear optimization algorithm for lower bound limit analysis

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 2 2003
Kristian Krabbenhoft
Abstract The non-linear programming problem associated with the discrete lower bound limit analysis problem is treated by means of an algorithm where the need to linearize the yield criteria is avoided. The algorithm is an interior point method and is completely general in the sense that no particular finite element discretization or yield criterion is required. As with interior point methods for linear programming the number of iterations is affected only little by the problem size. Some practical implementation issues are discussed with reference to the special structure of the common lower bound load optimization problem, and finally the efficiency and accuracy of the method is demonstrated by means of examples of plate and slab structures obeying different non-linear yield criteria. Copyright © 2002 John Wiley & Sons, Ltd. [source]


A numerically scalable domain decomposition method for the solution of frictionless contact problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 12 2001
D. Dureisseix
Abstract We present a domain decomposition method with Lagrange multipliers for solving iteratively frictionless contact problems. This method, which is based on the FETI method and therefore is named here the FETI-C method, incorporates a coarse contact system that guides the iterative prediction of the active zone of contact. We demonstrate numerically that this method is numerically scalable with respect to both the problem size and the number of subdomains. Copyright © 2001 John Wiley & Sons, Ltd. [source]


Numerical solutions of fully non-linear and highly dispersive Boussinesq equations in two horizontal dimensions

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 3 2004
David R. Fuhrman
Abstract This paper investigates preconditioned iterative techniques for finite difference solutions of a high-order Boussinesq method for modelling water waves in two horizontal dimensions. The Boussinesq method solves simultaneously for all three components of velocity at an arbitrary z -level, removing any practical limitations based on the relative water depth. High-order finite difference approximations are shown to be more efficient than low-order approximations (for a given accuracy), despite the additional overhead. The resultant system of equations requires that a sparse, unsymmetric, and often ill-conditioned matrix be solved at each stage evaluation within a simulation. Various preconditioning strategies are investigated, including full factorizations of the linearized matrix, ILU factorizations, a matrix-free (Fourier space) method, and an approximate Schur complement approach. A detailed comparison of the methods is given for both rotational and irrotational formulations, and the strengths and limitations of each are discussed. Mesh-independent convergence is demonstrated with many of the preconditioners for solutions of the irrotational formulation, and solutions using the Fourier space and approximate Schur complement preconditioners are shown to require an overall computational effort that scales linearly with problem size (for large problems). Calculations on a variable depth problem are also compared to experimental data, highlighting the accuracy of the model. Through combined physical and mathematical insight effective preconditioned iterative solutions are achieved for the full physical application range of the model. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Algebraic multigrid Laplace solver for the extraction of capacitances of conductors in multi-layer dielectrics

INTERNATIONAL JOURNAL OF NUMERICAL MODELLING: ELECTRONIC NETWORKS, DEVICES AND FIELDS, Issue 5 2007
Prasad S. Sumant
Abstract This paper describes the development of a robust multigrid, finite element-based, Laplace solver for accurate capacitance extraction of conductors embedded in multi-layer dielectric domains. An algebraic multigrid based on element interpolation is adopted and streamlined for the development of the proposed solver. In particular, a new, node-based agglomeration scheme is proposed to speed up the process of agglomeration. Several attributes of this new method are investigated through the application of the Laplace solver to the calculation of the per-unit-length capacitance of configurations of parallel, uniform conductors embedded in multi-layer dielectric substrates. These two-dimensional configurations are commonly encountered as high-speed interconnect structures for integrated electronic circuits. The proposed method is shown to be particularly robust and accurate for structures with very thin dielectric layers characterized by large variation in their electric permittivities. More specifically, it is demonstrated that for such geometries the proposed node-based agglomeration systematically reduces the problem size and speeds up the iterative solution of the finite element matrix. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Higher order explicit time integration schemes for Maxwell's equations

INTERNATIONAL JOURNAL OF NUMERICAL MODELLING: ELECTRONIC NETWORKS, DEVICES AND FIELDS, Issue 5-6 2002
Holger Spachmann
Abstract The finite integration technique (FIT) is an efficient and universal method for solving a wide range of problems in computational electrodynamics. The conventional formulation in time-domain (FITD) has a second-order accuracy with respect to spatial and temporal discretization and is computationally equivalent with the well-known finite difference time-domain (FDTD) scheme. The dispersive character of the second-order spatial operators and temporal integration schemes limits the problem size to electrically small structures. In contrast higher-order approaches result not only in low-dispersive schemes with modified stability conditions but also higher computational costs. In this paper, a general framework of explicit Runge,Kutta and leap-frog integrators of arbitrary orders N is derived. The powerful root-locus method derived from general system theory forms the basis of the theoretical mainframe for analysing convergence, stability and dispersion characteristics of the proposed integrators. As it is clearly stated, the second- and fourth-order leap-frog scheme are highly preferable in comparison to any other higher order Runge,Kutta or leap-frog scheme concerning stability, efficiency and energy conservation. Copyright © 2002 John Wiley & Sons, Ltd. [source]


Rotamer optimization for protein design through MAP estimation and problem-size reduction

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 12 2009
Eun-Jong Hong
Abstract The search for the global minimum energy conformation (GMEC) of protein side chains is an important computational challenge in protein structure prediction and design. Using rotamer models, the problem is formulated as a NP-hard optimization problem. Dead-end elimination (DEE) methods combined with systematic A* search (DEE/A*) has proven useful, but may not be strong enough as we attempt to solve protein design problems where a large number of similar rotamers is eligible and the network of interactions between residues is dense. In this work, we present an exact solution method, named BroMAP (branch-and-bound rotamer optimization using MAP estimation), for such protein design problems. The design goal of BroMAP is to be able to expand smaller search trees than conventional branch-and-bound methods while performing only a moderate amount of computation in each node, thereby reducing the total running time. To achieve that, BroMAP attempts reduction of the problem size within each node through DEE and elimination by lower bounds from approximate maximum-a-posteriori (MAP) estimation. The lower bounds are also exploited in branching and subproblem selection for fast discovery of strong upper bounds. Our computational results show that BroMAP tends to be faster than DEE/A* for large protein design cases. BroMAP also solved cases that were not solved by DEE/A* within the maximum allowed time, and did not incur significant disadvantage for cases where DEE/A* performed well. Therefore, BroMAP is particularly applicable to large protein design problems where DEE/A* struggles and can also substitute for DEE/A* in general GMEC search. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009 [source]


New approach to refinery process simulation with adaptive composition representation

AICHE JOURNAL, Issue 3 2004
Heiko Briesen
Abstract The established technique for simulation of refinery processes is the use of pseudocomponents. However, in order to increase the economical benefit of plant operation, it seems inevitable to include molecular information in petroleum mixtures characterization. This will lead to a strong increase of problem size. For this new class of models, there currently seems to be no special algorithms available. The classic pseudocomponent approach is compared with a newly developed solution strategy, which is explicitly developed to efficiently solve simulation problems with a high detail in composition representation. The new solution strategy is an adaptive multigrid method based on a wavelet,Galerkin discretization. With the wavelet,Galerkin discretization the model can easily be formulated on various levels of detail. In an iterative procedure the multigrid concept exploits these different formulations to construct correction-term approximations to the true solution. The discretization of these correction-term models is now done with a detail in composition representation that is determined by a residual-based adaptation strategy. The proposed method has been implemented for a simple 9-stage distillation column and tested for a variety of feed mixtures. In all investigated tests the proposed method proved to be superior to the conventional pseudocomponent approach in terms of accuracy and efficiency. © 2004 American Institute of Chemical Engineers AIChE J, 50: 633,645, 2004 [source]


A selective newsvendor approach to order management

NAVAL RESEARCH LOGISTICS: AN INTERNATIONAL JOURNAL, Issue 8 2008
Kevin Taaffe
Abstract Consider a supplier offering a product to several potential demand sources, each with a unique revenue, size, and probability that it will materialize. Given a long procurement lead time, the supplier must choose the orders to pursue and the total quantity to procure prior to the selling season. We model this as a selective newsvendor problem of maximizing profits where the total (random) demand is given by the set of pursued orders. Given that the dimensionality of a mixed-integer linear programming formulation of the problem increases exponentially with the number of potential orders, we develop both a tailored exact algorithm based on the L-shaped method for two-stage stochastic programming as well as a heuristic method. We also extend our solution approach to account for piecewise-linear cost and revenue functions as well as a multiperiod setting. Extensive experimentation indicates that our exact approach rapidly finds optimal solutions with three times as many orders as a state-of-the-art commercial solver. In addition, our heuristic approach provides average gaps of less than 1% for the largest problems that can be solved exactly. Observing that the gaps decrease as problem size grows, we expect the heuristic approach to work well for large problem instances. © 2008 Wiley Periodicals, Inc. Naval Research Logistics 2008 [source]


An algebraic generalization of local Fourier analysis for grid transfer operators in multigrid based on Toeplitz matrices

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 2-3 2010
M. Donatelli
Abstract Local Fourier analysis (LFA) is a classical tool for proving convergence theorems for multigrid methods (MGMs). In particular, we are interested in optimal convergence, i.e. convergence rates that are independent of the problem size. For elliptic partial differential equations (PDEs), a well-known optimality result requires that the sum of the orders of the grid transfer operators is not lower than the order of the PDE approximated. Analogously, when dealing with MGMs for Toeplitz matrices, a well-known optimality condition concerns the position and the order of the zeros of the symbols of the grid transfer operators. In this work we show that in the case of elliptic PDEs with constant coefficients, the two different approaches lead to an equivalent condition. We argue that the analysis for Toeplitz matrices is an algebraic generalization of the LFA, which allows to deal not only with differential problems but also for instance with integral problems. The equivalence of the two approaches gives the possibility of using grid transfer operators with different orders also for MGMs for Toeplitz matrices. We give also a class of grid transfer operators related to the B-spline's refinement equation and study their geometric properties. Numerical experiments confirm the correctness of the proposed analysis. Copyright © 2010 John Wiley & Sons, Ltd. [source]


Algebraic multigrid, mixed-order interpolation, and incompressible fluid flow

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 1 2010
R. Webster
Abstract This paper presents the results of numerical experiments on the use of equal-order and mixed-order interpolations in algebraic multigrid (AMG) solvers for the fully coupled equations of incompressible fluid flow. Several standard test problems are addressed for Reynolds numbers spanning the laminar range. The range of unstructured meshes spans over two orders of problem size (over one order of mesh bandwidth). Deficiencies in performance are identified for AMG based on equal-order interpolations (both zero-order and first-order). They take the form of poor, fragile, mesh-dependent convergence rates. The evidence suggests that a degraded representation of the inter-field coupling in the coarse-grid approximation is the cause. Mixed-order interpolation (first-order for the vectors, zero-order for the scalars) is shown to address these deficiencies. Convergence is then robust, independent of the number of coarse grids and (almost) of the mesh bandwidth. The AMG algorithms used are reviewed. Copyright © 2009 John Wiley & Sons, Ltd. [source]


A hybrid domain decomposition method based on aggregation

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 4 2004
*Article first published online: 19 APR 200, Yu. Vassilevski
Abstract A new two-level black-box preconditioner based on the hybrid domain decomposition technique is proposed and studied. The preconditioner is a combination of an additive Schwarz preconditioner and a special smoother. The smoother removes dependence of the condition number on the number of subdomains and variations of the diffusion coefficient and leaves minor sensitivity to the problem size. The algorithm is parallel and pure algebraic which makes it a convenient framework for the construction parallel black-box preconditioners on unstructured meshes. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Comparing arithmetic and semantic fact retrieval: Effects of problem size and sentence constraint on event-related brain potentials

PSYCHOPHYSIOLOGY, Issue 6 2003
Kerstin Jost
Abstract Event-related potentials were recorded with 61 electrodes from 16 students who verified either the correctness of single-digit multiplication problems or the semantic congruency of sentences. Multiplication problems varied in size and sentence fragments in constraint. Both semantic and arithmetic incongruencies evoked a typical N400 with a clear parieto-central maximum. In addition, numerically larger problems (8×7), in comparison to smaller problems (3×2), evoked a negativity starting at about 360 ms whose maximum was located over the right temporal-parietal scalp. These results indicate that the arithmetic incongruency and the problem-size effect are functionally distinct. It is suggested that the arithmetic and the semantic incongruency effects are both functionally related to a context-dependent spread of activation in specialized associative networks, whereas the arithmetic problem-size effect is due to rechecking routines that go beyond basic fact retrieval. [source]


Developmental Change and Individual Differences in Children's Multiplication

CHILD DEVELOPMENT, Issue 4 2003
Donald J. Mabbott
Age-related change and patterns of individual differences in children's knowledge and skill in multiplication were investigated for students in Grades 4 and 6 (approximately ages 9 and 11, respectively) by examining multiple measures of computational skill, conceptual knowledge, and working memory. Regression analyses revealed that indexes reflecting probability of retrieval and special problem characteristics overshadow other, more general indexes (problem size and frequency of presentation) in predicting solution latencies. Some improvement in the use of conceptual knowledge was evident between Grades 4 and 6, but this change was neither strong nor uniform across tasks. Finally, patterns of individual differences across tasks differed as a function of grade level. The findings have implications for understanding developmental change and individual differences in mathematical cognition. [source]


GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing framework

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 13 2005
Arun KrishnanArticle first published online: 24 JUN 200
Abstract Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid-enabled, high-throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid-enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a ,mini-Grid' testbed and the results presented here show that for large problem sizes, a distributed, Grid-enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd. [source]


A cache-efficient implementation of the lattice Boltzmann method for the two-dimensional diffusion equation

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 14 2004
A. C. Velivelli
Abstract The lattice Boltzmann method is an important technique for the numerical solution of partial differential equations because it has nearly ideal scalability on parallel computers for many applications. However, to achieve the scalability and speed potential of the lattice Boltzmann technique, the issues of data reusability in cache-based computer architectures must be addressed. Utilizing the two-dimensional diffusion equation, , this paper examines cache optimization for the lattice Boltzmann method in both serial and parallel implementations. In this study, speedups due to cache optimization were found to be 1.9,2.5 for the serial implementation and 3.6,3.8 for the parallel case in which the domain decomposition was optimized for stride-one access. In the parallel non-cached implementation, the method of domain decomposition (horizontal or vertical) used for parallelization did not significantly affect the compute time. In contrast, the cache-based implementation of the lattice Boltzmann method was significantly faster when the domain decomposition was optimized for stride-one access. Additionally, the cache-optimized lattice Boltzmann method in which the domain decomposition was optimized for stride-one access displayed superlinear scalability on all problem sizes as the number of processors was increased. Copyright © 2004 John Wiley & Sons, Ltd. [source]


A parallel Broyden approach to the Toeplitz inverse eigenproblem

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 6 2004
Jesús Peinado
Abstract In this work we show a portable sequential and a portable parallel algorithm for solving the inverse eigenproblem for real symmetric Toeplitz matrices. Both algorithms are based on Broyden's method for solving nonlinear systems. We reduced the computational cost for some problem sizes, and furthermore we managed to reduce spatial cost considerably, compared in both cases with parallel algorithms proposed by other authors and by us, although sometimes quasi-Newton methods (as Broyden) do not reach convergence in all the test cases. We have implemented the parallel algorithm using the parallel numerical linear algebra library SCALAPACK based on the MPI environment. Experimental results have been obtained using two different architectures: a shared memory multiprocessor, the SGI PowerChallenge, and a cluster of Pentium II PCs connected through a myrinet network. The algorithms obtained are scalable in all the cases. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Explicit coupled thermo-mechanical finite element model of steel solidification

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 1 2009
Seid Koric
Abstract The explicit finite element method is applied in this work to simulate the coupled and highly non-linear thermo-mechanical phenomena that occur during steel solidification in continuous casting processes. Variable mass scaling is used to efficiently model these processes in their natural time scale using a Lagrangian formulation. An efficient and robust local,global viscoplastic integration scheme (Int. J. Numer. Meth. Engng 2006; 66:1955,1989) to solve the highly temperature- and rate-dependent elastic,viscoplastic constitutive equations of solidifying steel has been implemented into the commercial software ABAQUS/Explicit (ABAQUS User Manuals v6.7. Simulia Inc., 2007) using a VUMAT subroutine. The model is first verified with a known semi-analytical solution from Weiner and Boley (J. Mech. Phys. Solids 1963; 11:145,154). It is then applied to simulate temperature and stress development in solidifying shell sections in continuous casting molds using realistic temperature-dependent properties and including the effects of ferrostatic pressure, narrow face taper, and mechanical contact. Example simulations include a fully coupled thermo-mechanical analysis of a billet-casting and thin-slab casting in a funnel mold. Explicit temperature and stress results are compared with the results of an implicit formulation and computing times are benchmarked for different problem sizes and different numbers of processor cores. The explicit formulation exhibits significant advantages for this class of contact-solidification problems, especially with large domains on the latest parallel computing platforms. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Implementation and evaluation of MPI-based parallel MD program

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 1 2001
R. Trobec
Abstract The message-passing interface (MPI)-based object-oriented particle,particle interactions (PPI) library is implemented and evaluated. The library can be used in the n -particle simulation algorithm designed for a ring of p interconnected processors. The parallel simulation is scalable with the number of processors, and has the time requirement proportional to n2/p if n/p is large enough, which guarantees optimal speedup. In a certain range of problem sizes, the speedup becomes superlinear because enough cache memory is available in the system. The library is used in a simple way by any potential user, even with no deep programming knowledge. Different simulations using particles can be implemented on a wide spectrum of different computer platforms. The main purpose of this article is to test the PPI library on well-known methods, e.g., the parallel molecular dynamics (MD) simulation of the monoatomic system by the second-order leapfrog Verlet algorithm. The performances of the parallel simulation program implemented with the proposed library are competitive with a custom-designed simulation code. Also, the implementation of the split integration symplectic method, based on the analytical calculation of the harmonic part of the particle interactions, is shown, and its expected performances are predicted. © 2001 John Wiley & Sons, Inc. Int J Quant Chem 84: 23,31, 2001 [source]