Memory Requirements (memory + requirement)

Distribution by Scientific Domains
Distribution within Engineering


Selected Abstracts


A fast multi-level convolution boundary element method for transient diffusion problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 14 2005
C.-H. Wang
Abstract A new algorithm is developed to evaluate the time convolution integrals that are associated with boundary element methods (BEM) for transient diffusion. This approach, which is based upon the multi-level multi-integration concepts of Brandt and Lubrecht, provides a fast, accurate and memory efficient time domain method for this entire class of problems. Conventional BEM approaches result in operation counts of order O(N2) for the discrete time convolution over N time steps. Here we focus on the formulation for linear problems of transient heat diffusion and demonstrate reduced computational complexity to order O(N3/2) for three two-dimensional model problems using the multi-level convolution BEM. Memory requirements are also significantly reduced, while maintaining the same level of accuracy as the conventional time domain BEM approach. Copyright © 2005 John Wiley & Sons, Ltd. [source]


An Adaptive Sampling Scheme for Out-of-Core Simplification

COMPUTER GRAPHICS FORUM, Issue 2 2002
Guangzheng Fei
Current out-of-core simplification algorithms can efficiently simplify large models that are too complex to be loaded in to the main memory at one time. However, these algorithms do not preserve surface details well since adaptive sampling, a typical strategy for detail preservation, remains to be an open issue for out-of-core simplification. In this paper, we present an adaptive sampling scheme, called the balanced retriangulation (BR), for out-of-core simplification. A key idea behind BR is that we can use Garland's quadric error matrix to analyze the global distribution of surface details. Based on this analysis, a local retriangulation achieves adaptive sampling by restoring detailed areas with cell split operations while further simplifying smooth areas with edge collapse operations. For a given triangle budget, BR preserves surface details significantly better than uniform sampling algorithms such as uniform clustering. Like uniform clustering, our algorithm has linear running time and small memory requirement. [source]


Parsimonious finite-volume frequency-domain method for 2-D P,SV -wave modelling

GEOPHYSICAL JOURNAL INTERNATIONAL, Issue 2 2008
R. Brossier
SUMMARY A new numerical technique for solving 2-D elastodynamic equations based on a finite-volume frequency-domain approach is proposed. This method has been developed as a tool to perform 2-D elastic frequency-domain full-waveform inversion. In this context, the system of linear equations that results from the discretization of the elastodynamic equations is solved with a direct solver, allowing efficient multiple-source simulations at the partial expense of the memory requirement. The discretization of the finite-volume approach is through triangles. Only fluxes with the required quantities are shared between the cells, relaxing the meshing conditions, as compared to finite-element methods. The free surface is described along the edges of the triangles, which can have different slopes. By applying a parsimonious strategy, the stress components are eliminated from the discrete equations and only the velocities are left as unknowns in the triangles. Together with the local support of the P0 finite-volume stencil, the parsimonious approach allows the minimizing of core memory requirements for the simulation. Efficient perfectly matched layer absorbing conditions have been designed for damping the waves around the grid. The numerical dispersion of this FV formulation is similar to that of O(,x2) staggered-grid finite-difference (FD) formulations when considering structured triangular meshes. The validation has been performed with analytical solutions of several canonical problems and with numerical solutions computed with a well-established FD time-domain method in heterogeneous media. In the presence of a free surface, the finite-volume method requires 10 triangles per wavelength for a flat topography, and fifteen triangles per wavelength for more complex shapes, well below the criteria required by the staircase approximation of O(,x2) FD methods. Comparisons between the frequency-domain finite-volume and the O(,x2) rotated FD methods also show that the former is faster and less memory demanding for a given accuracy level, an attractive feature for frequency-domain seismic inversion. We have thus developed an efficient method for 2-D P,SV -wave modelling on structured triangular meshes as a tool for frequency-domain full-waveform inversion. Further work is required to improve the accuracy of the method on unstructured meshes. [source]


Fast multipole boundary element analysis of two-dimensional elastoplastic problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 10 2007
P. B. Wang
Abstract This paper presents a fast multipole boundary element method (BEM) for the analysis of two-dimensional elastoplastic problems. An incremental iterative technique based on the initial strain approach is employed to solve the nonlinear equations, and the fast multipole method (FMM) is introduced to achieve higher run-time and memory storage efficiency. Both of the boundary integrals and domain integrals are calculated by recursive operations on a quad-tree structure without explicitly forming the coefficient matrix. Combining multipole expansions with local expansions, computational complexity and memory requirement of the matrix,vector multiplication are both reduced to O(N), where N is the number of degrees of freedom (DOFs). The accuracy and efficiency of the proposed scheme are demonstrated by several numerical examples. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Fifth-order Hermitian schemes for computational linear aeroacoustics

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 9 2007
Article first published online: 17 APR 200, G. Capdeville
Abstract We develop a class of fifth-order methods to solve linear acoustics and/or aeroacoustics. Based on local Hermite polynomials, we investigate three competing strategies for solving hyperbolic linear problems with a fifth-order accuracy. A one-dimensional (1D) analysis in the Fourier series makes it possible to classify these possibilities. Then, numerical computations based on the 1D scalar advection equation support two possibilities in order to update the discrete variable and its first and second derivatives: the first one uses a procedure similar to that of Cauchy,Kovaleskaya (the ,,-P5 scheme'); the second one relies on a semi-discrete form and evolves in time the discrete unknowns by using a five-stage Runge,Kutta method (the ,RGK-P5 scheme'). Although the RGK-P5 scheme shares the same local spatial interpolator with the ,-P5 scheme, it is algebraically simpler. However, it is shown numerically that its loss of compactness reduces its domain of stability. Both schemes are then extended to bi-dimensional acoustics and aeroacoustics. Following the methodology validated in (J. Comput. Phys. 2005; 210:133,170; J. Comput. Phys. 2006; 217:530,562), we build an algorithm in three stages in order to optimize the procedure of discretization. In the ,reconstruction stage', we define a fifth-order local spatial interpolator based on an upwind stencil. In the ,decomposition stage', we decompose the time derivatives into simple wave contributions. In the ,evolution stage', we use these fluctuations to update either by a Cauchy,Kovaleskaya procedure or by a five-stage Runge,Kutta algorithm, the discrete variable and its derivatives. In this way, depending on the configuration of the ,evolution stage', two fifth-order upwind Hermitian schemes are constructed. The effectiveness and the exactitude of both schemes are checked by their applications to several 2D problems in acoustics and aeroacoustics. In this aim, we compare the computational cost and the computation memory requirement for each solution. The RGK-P5 appears as the best compromise between simplicity and accuracy, while the ,-P5 scheme is more accurate and less CPU time consuming, despite a greater algebraic complexity. Copyright © 2007 John Wiley & Sons, Ltd. [source]


A two-scale domain decomposition method for computing the flow through a porous layer limited by a perforated plate

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 6 2003
J. Dufręche
Abstract A two-scale domain decomposition method is developed in order to study situations where the macroscopic description of a given transport process in porous media does not represent a sufficiently good approximation near singularities (holes, wells, etc.). The method is based on a decomposition domain technique with overlapping. The governing equations at the scale of the microstructure are solved in the vicinity of the singularities whereas the volume averaged transport equations are solved at some distance of the singularities. The transfer of information from one domain to the other is performed using results of the method of volume averaging. The method is illustrated through the computation of the overall permeability of a porous layer limited by a perforated plate. As shown in the example treated, the method allows one to estimate the useful size of the microscopic region near the singularities. As illustrated in the paper, the method can lead to a considerable gain in memory requirement compared to a full direct simulation. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Adaptive integral method combined with the loose GMRES algorithm for planar structures analysis

INTERNATIONAL JOURNAL OF RF AND MICROWAVE COMPUTER-AIDED ENGINEERING, Issue 1 2009
W. Zhuang
Abstract In this article, the adaptive integral method (AIM) is used to analyze large-scale planar structures. Discretization of the corresponding integral equations by method of moment (MoM) with Rao-Wilton-Glisson (RWG) basis functions can model arbitrarily shaped planar structures, but usually leads to a fully populated matrix. AIM could map these basis functions onto a rectangular grid, where the Toeplitz property of the Green's function would be utilized, which enables the calculation of the matrix-vector multiplication by use of the fast Fourier transform (FFT) technique. It reduces the memory requirement from O(N2) to O(N) and the operation complexity from O(N2) to O(N log N), where N is the number of unknowns. The resultant equations are then solved by the loose generalized minimal residual method (LGMRES) to accelerate iteration, which converges much faster than the conventional conjugate gradient method (CG). Furthermore, several preconditioning techniques are employed to enhance the computational efficiency of the LGMRES. Some typical microstrip circuits and microstrip antenna array are analyzed and numerical results show that the preconditioned LGMRES can converge much faster than conventional LGMRES. © 2008 Wiley Periodicals, Inc. Int J RF and Microwave CAE, 2009. [source]


Estimation of breeding values from large-sized routine carcass data in Japanese Black cattle using Bayesian analysis

ANIMAL SCIENCE JOURNAL, Issue 6 2009
Aisaku ARAKAWA
ABSTRACT Volumes of official data sets have been increasing rapidly in the genetic evaluation using the Japanese Black routine carcass field data. Therefore, an alternative approach with smaller memory requirement to the current one using the restricted maximum likelihood (REML) and the empirical best linear unbiased prediction (EBLUP) is desired. This study applied a Bayesian analysis using Gibbs sampling (GS) to a large data set of the routine carcass field data and practically verified its validity in the estimation of breeding values. A Bayesian analysis like REML-EBLUP was implemented, and the posterior means were calculated using every 10th sample from 90 000 of samples after 10 000 samples discarded. Moment and rank correlations between breeding values estimated by GS and REML-EBLUP were very close to one, and the linear regression coefficients and the intercepts of the GS on the REML-EBLUP estimates were substantially one and zero, respectively, showing a very good agreement between breeding value estimation by the current GS and the REML-EBLUP. The current GS required only one-sixth of the memory space with REML-EBLUP. It is confirmed that the current GS approach with relatively small memory requirement is valid as a genetic evaluation procedure using large routine carcass data. [source]


Shallow Bounding Volume Hierarchies for Fast SIMD Ray Tracing of Incoherent Rays

COMPUTER GRAPHICS FORUM, Issue 4 2008
H. Dammertz
Abstract Photorealistic image synthesis is a computationally demanding task that relies on ray tracing for the evaluation of integrals. Rendering time is dominated by tracing long paths that are very incoherent by construction. We therefore investigate the use of SIMD instructions to accelerate incoherent rays. SIMD is used in the hierarchy construction, the tree traversal and the leaf intersection. This is achieved by increasing the arity of acceleration structures, which also reduces memory requirements. We show that the resulting hierarchies can be built quickly and are smaller than acceleration structures known so far while at the same time outperforming them for incoherent rays. Our new acceleration structure speeds up ray tracing by a factor of 1.6 to 2.0 compared to a highly optimized bounding interval hierarchy implementation, and 1.3 to 1.6 compared to an efficient kd-tree. At the same time, the memory requirements are reduced by 10,50%. Additionally we show how a caching mechanism in conjunction with this memory efficient hierarchy can be used to speed up shadow rays in a global illumination algorithm without increasing the memory footprint. This optimization decreased the number of traversal steps up to 50%. [source]


Clock synchronization in Cell/B.E. traces

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 14 2009
M. Biberstein
Abstract Cell/B.E. is a heterogeneous multicore processor that was designed for the efficient execution of parallel and vectorizable applications with high computation and memory requirements. The transition to multicores introduces the challenge of providing tools that help programmers tune the code running on these architectures. Tracing tools, in particular, often help locate performance problems related to thread and process communication. A major impediment to implementing tracing on Cell is the absence of a common clock that can be accessed at low cost from all cores. The OS clock is costly to access from the auxiliary cores and the hardware timers cannot be simultaneously set on all the cores. In this paper, we describe an offline trace analysis algorithm that assigns wall-clock time to trace records based on their thread-local time stamps and event order. Our experiments on several Cell SDK workloads show that the indeterminism in assigning wall-clock time to events is low, on average 20,40 clock ticks (translating into 1.4,2.8,µs on the system used in our experiments). We also show how various practical problems, such as the imprecision of time measurement, can be overcome. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Parallel four-dimensional Haralick texture analysis for disk-resident image datasets

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 1 2007
Brent Woods
Abstract Texture analysis is one possible method of detecting features in biomedical images. During texture analysis, texture-related information is found by examining local variations in image brightness. Four-dimensional (4D) Haralick texture analysis is a method that extracts local variations along space and time dimensions and represents them as a collection of 14 statistical parameters. However, application of the 4D Haralick method on large time-dependent image datasets is hindered by data retrieval, computation, and memory requirements. This paper describes a parallel implementation using a distributed component-based framework of 4D Haralick texture analysis on PC clusters. The experimental performance results show that good performance can be achieved for this application via combined use of task- and data-parallelism. In addition, we show that our 4D texture analysis implementation can be used to classify imaged tissues. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Greater hippocampal neuronal recruitment in food-storing than in non-food-storing birds

DEVELOPMENTAL NEUROBIOLOGY, Issue 4 2007
Jennifer S. Hoshooley
Abstract Previous research has shown heightened recruitment of new neurons to the chickadee hippocampus in the fall. The present study was conducted to determine whether heightened fall recruitment is associated with the seasonal onset of food-storing by comparing neurogenesis in chickadees and a non-food-storing species, the house sparrow. Chickadees and house sparrows were captured in the wild in fall and spring and received multiple injections of the cell birth marker bromodeoxyuridine (BrdU). Birds were held in captivity and the level of hippocampal neuron recruitment was assessed after 6 weeks. Chickadees showed significantly more hippocampal neuronal recruitment than house sparrows. We found no seasonal differences in hippocampal neuronal recruitment in either species. In chickadees and in house sparrows, one-third of new cells labeled for BrdU also expressed the mature neuronal protein, NeuN. In a region adjacent to the hippocampus, the hyperpallium apicale, we observed no significant differences in neuronal recruitment between species or between seasons. Hippocampal volume and total neuron number both were greater in spring than in fall in chickadees, but no seasonal differences were observed in house sparrows. Enhanced neuronal recruitment in the hippocampus of food-storing chickadees suggests a degree of neurogenic specialization that may be associated with the spatial memory requirements of food-storing behavior. © 2007 Wiley Periodicals, Inc. Develop Neurobiol, 2007. [source]


Parsimonious finite-volume frequency-domain method for 2-D P,SV -wave modelling

GEOPHYSICAL JOURNAL INTERNATIONAL, Issue 2 2008
R. Brossier
SUMMARY A new numerical technique for solving 2-D elastodynamic equations based on a finite-volume frequency-domain approach is proposed. This method has been developed as a tool to perform 2-D elastic frequency-domain full-waveform inversion. In this context, the system of linear equations that results from the discretization of the elastodynamic equations is solved with a direct solver, allowing efficient multiple-source simulations at the partial expense of the memory requirement. The discretization of the finite-volume approach is through triangles. Only fluxes with the required quantities are shared between the cells, relaxing the meshing conditions, as compared to finite-element methods. The free surface is described along the edges of the triangles, which can have different slopes. By applying a parsimonious strategy, the stress components are eliminated from the discrete equations and only the velocities are left as unknowns in the triangles. Together with the local support of the P0 finite-volume stencil, the parsimonious approach allows the minimizing of core memory requirements for the simulation. Efficient perfectly matched layer absorbing conditions have been designed for damping the waves around the grid. The numerical dispersion of this FV formulation is similar to that of O(,x2) staggered-grid finite-difference (FD) formulations when considering structured triangular meshes. The validation has been performed with analytical solutions of several canonical problems and with numerical solutions computed with a well-established FD time-domain method in heterogeneous media. In the presence of a free surface, the finite-volume method requires 10 triangles per wavelength for a flat topography, and fifteen triangles per wavelength for more complex shapes, well below the criteria required by the staircase approximation of O(,x2) FD methods. Comparisons between the frequency-domain finite-volume and the O(,x2) rotated FD methods also show that the former is faster and less memory demanding for a given accuracy level, an attractive feature for frequency-domain seismic inversion. We have thus developed an efficient method for 2-D P,SV -wave modelling on structured triangular meshes as a tool for frequency-domain full-waveform inversion. Further work is required to improve the accuracy of the method on unstructured meshes. [source]


A frontal solver for the 21st century

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 10 2006
Jennifer A. Scott
Abstract In recent years there have been a number of important developments in frontal algorithms for solving the large sparse linear systems of equations that arise from finite-element problems. We report on the design of a new fully portable and efficient frontal solver for large-scale real and complex unsymmetric linear systems from finite-element problems that incorporates these developments. The new package offers both a flexible reverse communication interface and a simple to use all-in-one interface, which is designed to make the package more accessible to new users. Other key features include automatic element ordering using a state-of-the-art hybrid multilevel spectral algorithm, minimal main memory requirements, the use of high-level BLAS, and facilities to allow the solver to be used as part of a parallel multiple front solver. The performance of the new solver, which is written in Fortran 95, is illustrated using a range of problems from practical applications. The solver is available as package HSL_MA42_ELEMENT within the HSL mathematical software library and, for element problems, supersedes the well-known MA42 package. Copyright © 2006 John Wiley & Sons, Ltd. [source]


An efficient out-of-core multifrontal solver for large-scale unsymmetric element problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 7 2009
J. K. Reid
Abstract In many applications where the efficient solution of large sparse linear systems of equations is required, a direct method is frequently the method of choice. Unfortunately, direct methods have a potentially severe limitation: as the problem size grows, the memory needed generally increases rapidly. However, the in-core memory requirements can be limited by storing the matrix and its factors externally, allowing the solver to be used for very large problems. We have designed a new out-of-core package for the large sparse unsymmetric systems that arise from finite-element problems. The code, which is called HSL_MA78, implements a multifrontal algorithm and achieves efficiency through the use of specially designed code for handling the input/output operations and efficient dense linear algebra kernels. These kernels, which are available as a separate package called HSL_MA74, use high-level BLAS to perform the partial factorization of the frontal matrices and offer both threshold partial and rook pivoting. In this paper, we describe the design of HSL_MA78 and explain its user interface and the options it offers. We also describe the algorithms used by HSL_MA74 and illustrate the performance of our new codes using problems from a range of practical applications. Copyright © 2008 John Wiley & Sons, Ltd. [source]


A study on the lumped preconditioner and memory requirements of FETI and related primal domain decomposition methods

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 13 2008
Yannis Fragakis
Abstract In recent years, domain decomposition methods (DDMs) have emerged as advanced solvers in several areas of computational mechanics. In particular, during the last decade, in the area of solid and structural mechanics, they reached a considerable level of advancement and were shown to be more efficient than popular solvers, like advanced sparse direct solvers. The present contribution follows the lines of a series of recent publications on the relationship between primal and dual formulations of DDMs. In some of these papers, the effort to unify primal and dual methods led to a family of DDMs that was shown to be more efficient than the previous methods. The present paper extends this work, presenting a new family of related DDMs, thus enriching the theory of the relations between primal and dual methods, with the primal methods, which correspond to the dual DDM that uses the lumped preconditioner. The paper also compares the numerical performance of the new methods with that of the previous ones and focuses particularly on memory requirement issues related to the use of the lumped preconditioner, suggesting a particularly memory-efficient formulation. Copyright © 2007 John Wiley & Sons, Ltd. [source]


MR linear contact detection algorithm

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 1 2006
A. Munjiza
Abstract Large-scale discrete element simulations, as well as a whole range of related problems, involve contact of a large number of separate bodies and an efficient and robust contact detection algorithm is necessary. There has been a number of contact detection algorithms with total detection time proportional to N ln(N) (where N is the total number of separate bodies) reported in the past. In more recent years algorithms with total CPU time proportional to N have been developed. In this work, a novel contact detection algorithm with total detection time proportional to N is proposed. The performance of the algorithm is not influenced by packing density, while memory requirements are insignificant. The algorithm is applicable to systems comprising bodies of a similar size. The algorithm is named MR (Munjiza,Rougier: Munjiza devised the algorithm, Rougier implemented it). In the second part of the paper the algorithm is extended to particles of different sizes. The new algorithm is called MMR (multi-step MR) algorithm. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Parallel multipole implementation of the generalized Helmholtz decomposition for solving viscous flow problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 11 2003
Mary J. Brown
Abstract The evaluation of a domain integral is the dominant bottleneck in the numerical solution of viscous flow problems by vorticity methods, which otherwise demonstrate distinct advantages over primitive variable methods. By applying a Barnes,Hut multipole acceleration technique, the operation count for the integration is reduced from O(N2) to O(NlogN), while the memory requirements are reduced from O(N2) to O(N). The algorithmic parameters that are necessary to achieve such scaling are described. The parallelization of the algorithm is crucial if the method is to be applied to realistic problems. A parallelization procedure which achieves almost perfect scaling is shown. Finally, numerical experiments on a driven cavity benchmark problem are performed. The actual increase in performance and reduction in storage requirements match theoretical predictions well, and the scalability of the procedure is very good. Copyright © 2003 John Wiley Sons, Ltd. [source]


A practical determination strategy of optimal threshold parameter for matrix compression in wavelet BEM

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 2 2003
Kazuhiro Koro
Abstract A practical strategy is developed to determine the optimal threshold parameter for wavelet-based boundary element (BE) analysis. The optimal parameter is determined so that the amount of storage (and computational work) is minimized without reducing the accuracy of the BE solution. In the present study, the Beylkin-type truncation scheme is used in the matrix assembly. To avoid unnecessary integration concerning the truncated entries of a coefficient matrix, a priori estimation of the matrix entries is introduced and thus the truncated entries are determined twice: before and after matrix assembly. The optimal threshold parameter is set based on the equilibrium of the truncation and discretization errors. These errors are estimated in the residual sense. For Laplace problems the discretization error is, in particular, indicated with the potential's contribution ,c, to the residual norm ,R, used in error estimation for mesh adaptation. Since the normalized residual norm ,c,/,u, (u: the potential components of BE solution) cannot be computed without main BE analysis, the discretization error is estimated by the approximate expression constructed through subsidiary BE calculation with smaller degree of freedom (DOF). The matrix compression using the proposed optimal threshold parameter enables us to generate a sparse matrix with O(N1+,) (0,,<1) non-zero entries. Although the quasi-optimal memory requirements and complexity are not attained, the compression rate of a few per cent can be achieved for N,1000. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Numerical simulation of three-dimensional free surface flows

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 7 2003
V. Maronnier
Abstract A numerical model is presented for the simulation of complex fluid flows with free surfaces in three space dimensions. The model described in Maronnier et al. (J. Comput. Phys. 1999; 155(2) : 439) is extended to three dimensional situations. The mathematical formulation of the model is similar to that of the volume of fluid (VOF) method, but the numerical procedures are different. A splitting method is used for the time discretization. At each time step, two advection problems,one for the predicted velocity field and the other for the volume fraction of liquid,are to be solved. Then, a generalized Stokes problem is solved and the velocity field is corrected. Two different grids are used for the space discretization. The two advection problems are solved on a fixed, structured grid made out of small cubic cells, using a forward characteristic method. The generalized Stokes problem is solved using continuous, piecewise linear stabilized finite elements on a fixed, unstructured mesh of tetrahedrons. The three-dimensional implementation is discussed. Efficient postprocessing algorithms enhance the quality of the numerical solution. A hierarchical data structure reduces memory requirements. Numerical results are presented for complex geometries arising in mold filling. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Local block refinement with a multigrid flow solver

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 1 2002
C. F. Lange
Abstract A local block refinement procedure for the efficient computation of transient incompressible flows with heat transfer is presented. The procedure uses patched structured grids for the blockwise refinement and a parallel multigrid finite volume method with colocated primitive variables to solve the Navier-Stokes equations. No restriction is imposed on the value of the refinement rate and non-integer rates may also be used. The procedure is analysed with respect to its sensitivity to the refinement rate and to the corresponding accuracy. Several applications exemplify the advantages of the method in comparison with a common block structured grid approach. The results show that it is possible to achieve an improvement in accuracy with simultaneous significant savings in computing time and memory requirements. Copyright © 2002 John Wiley & Sons, Ltd. [source]


Error resilient data transport in sensor network applications: A generic perspective,

INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, Issue 2 2009
Rachit Agarwal
Abstract The error recovery problem in wireless sensor networks is studied from a generic resource-constrained energy-optimization perspective. To characterize the features of error recovery schemes that suit the majority of applications, an energy model is developed and inferences are drawn based on a suitable performance metric. For applications that require error control coding, an efficient scheme is proposed based on an interesting observation related to shortened Reed,Solomon (RS) codes for packet reliability. It is shown that multiple instances (,) of RS codes defined on a smaller alphabet combined with interleaving results in smaller resource usage, while the performance exceeds the benefits of a shortened RS code defined over a larger alphabet. In particular, the proposed scheme can have an error correction capability of up to , times larger than that of the conventional RS scheme without changing the rate of the code with much lower power, timing and memory requirements. Implementation results show that such a scheme is 43% more power efficient compared with the RS scheme with the same code rate. Besides, such an approach results in 46% faster computations and 53% reduction in memory requirements. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Efficient packet classification on network processors

INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 1 2008
Koert Vlaeminck
Abstract Always-on networking and a growing interest in multimedia- and conversational-IP services offer an opportunity to network providers to participate in the service layer, if they increase functional intelligence in their networks. An important prerequisite to providing advanced services in IP access networks is the availability of a high-speed packet classification module in the network nodes, necessary for supporting any IP service imaginable. Often, access nodes are installed in remote offices, where they terminate a large number of subscriber lines. As such, technology adding processing power in this environment should be energy-efficient, whilst maintaining the flexibility to cope with changing service requirements. Network processor units (NPUs) are designed to overcome these operational restrictions, and in this context this paper investigates their suitability for wireline and robust packet classification in a firewalling application. State-of-the-art packet classification algorithms are examined, whereafter the performance and memory requirements are compared for a Binary Decision Diagram (BDD) and sequential search approach. Several space optimizations for implementing BDD classifiers on NPU hardware are discussed and it is shown that the optimized BDD classifier is able to operate at gigabit wirespeed, independent of the ruleset size, which is a major advantage over a sequential search classifier. Copyright © 2007 John Wiley & Sons, Ltd. [source]


On estimation of the number of image principal colors and color reduction through self-organized neural networks

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, Issue 3 2002
A. Atsalakis
A new technique suitable for reduction of the number of colors in a color image is presented in this article. It is based on the use of the image Principal Color Components (PCC), which consist of the image color components and additional image components extracted with the use of proper spatial features. The additional spatial features are used to enhance the quality of the final image. First, the principal colors of the image and the principal colors of each PCC are extracted. Three algorithms were developed and tested for this purpose. Using Kohonen self-organizing feature maps (SOFM) as classifiers, the principal color components of each PCC are obtained and a look-up table, containing the principal colors of the PCC, is constructed. The final colors are extracted from the look-up table entries through a SOFM by setting the number of output neurons equal to the number of the principal colors obtained for the original image. To speed up the entire algorithm and reduce memory requirements, a fractal scanning subsampling technique is employed. The method is independent of the color scheme; it is applicable to any type of color images and can be easily modified to accommodate any type of spatial features. Several experimental and comparative results exhibiting the performance of the proposed technique are presented. © 2002 Wiley Periodicals, Inc. Int J Imaging Syst Technol 12, 117,127, 2002; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ima.10019 [source]


Development of 3-D equivalent-circuit modelling with decoupled L-ILU factorization in semiconductor-device simulation

INTERNATIONAL JOURNAL OF NUMERICAL MODELLING: ELECTRONIC NETWORKS, DEVICES AND FIELDS, Issue 3 2007
Szu-Ju Li
Abstract In this paper, we develop a three-dimensional (3-D) device simulator, which combines a simplified, decoupled Gummel-like method equivalent-circuit model (DM) with levelized incomplete LU (L-ILU) factorization. These complementary techniques are successfully combined to yield an efficient and robust method for semiconductor-device simulation. The memory requirements are reduced significantly compared to the conventionally used Newton-like method. Furthermore, the complex voltage-controlled current source (VCCS) is simplified as a nonlinear resistor. Hence, the programming and debugging for the nonlinear resistor model is much easier than that for the VCCS model. Further, we create a connection-table to arrange the scattered non-zero fill-ins in sparse matrix to increase the efficiency of L-ILU factorization. Low memory requirements may pave the way for the widespread application in 3-D semiconductor-device simulation. We use the body-tied silicon-on-insulator MOSFET structure to illustrate the capability and the efficiency of the 3-D DM equivalent-circuit model with L-ILU factorization. Copyright © 2007 John Wiley & Sons, Ltd. [source]


An improved direct labeling method for the max,flow min,cut computation in large hypergraphs and applications

INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, Issue 1 2003
Joachim Pistorius
Algorithms described so far to solve the maximum flow problem on hypergraphs first necessitate the transformation of these hypergraphs into directed graphs. The resulting maximum flow problem is then solved by standard algorithms. This paper describes a new method that solves the maximum flow problem directly on hypergraphs, leading to both reduced run time and lower memory requirements. We compare our approach with a state,of,the,art algorithm that uses a transformation of the hypergraph into a directed graph and an augmenting path algorithm to compute the maximum flow on this directed graph: the run,time complexity as well as the memory space complexity are reduced by a constant factor. Experimental results on large hypergraphs from VLSI applications show that the run time is reduced, on average, by a factor approximately 2, while memory occupation is reduced, on average, by a factor of 10. This improvement is particularly interesting for very large instances, to be solved in practical applications. [source]


On the impact of the solution representation for the Internet Protocol Network Design Problem with max-hop constraints

NETWORKS: AN INTERNATIONAL JOURNAL, Issue 2 2004
L. De Giovanni
Abstract The IP (Internet Protocol) Network Design Problem can be shortly stated as follows. Given a set of nodes and a set of traffic demands, we want to determine the minimum cost capacity installation such that all the traffic is routed. Capacity is provided by means of links of a given capacity and traffic must be loaded on the network according to the OSPF-ECM (Open Shortest Path First,Equal Commodity Multiflow) protocol, with additional constraints on the maximum number of hops. The problem is strongly NP-Hard, and the literature proposes local search-based heuristics that do not take into account max-hop constraints, or assume a simplified OSPF routing. The core in a local search approach is the network loading algorithm for the evaluation of the neighbor solutions costs. It presents critical aspects concerning both computational efficiency and memory requirements. Starting from a tabu search prototype, we show how these aspects deeply impact on the design of a local search procedure, even at the logical level. We present several properties of the related network loading problem, that allow to overcome the critical issues and lead to an efficient solution evaluation. © 2004 Wiley Periodicals, Inc. NETWORKS, VoL. 44(2), 73,83 2004 [source]


Advances in collision detection and non-linear finite mixed element modelling for improved soft tissue simulation in craniomaxillofacial surgical planning

THE INTERNATIONAL JOURNAL OF MEDICAL ROBOTICS AND COMPUTER ASSISTED SURGERY, Issue 1 2010
Shengzheng Wang
Abstract Background There is a huge demand to develop a method for assisting surgeons in automatically predicting soft tissue deformation in terms of a bone-remodelling plan. Methods This paper introduces several novel elements into a system for the simulation of postoperative facial appearances with respect to prespecified bone-remodelling plans. First, a new algorithm for efficient detection of collisions, using the signed distance field, is described. Next, the penalty method is applied to determine the contact load of bone on facial soft tissue. Finally, a non-linear finite mixed element model is developed to estimate the tissue deformation induced by the prescribed bone remodelling plan. Results The performance of the proposed collision detection algorithm has been improved in memory requirements and computational efficiency compared with conventional methods. In addition, the methodology is evaluated over both synthetic and real data, with simulation performance averaging <0.5 mm pointwise error over the facial surface in six mid-face distraction osteotogenesis procedures. Conclusions The experimental results support the novel methodological advancements in collision detection and biomechanical modelling proposed in this work. Copyright © 2009 John Wiley & Sons, Ltd. [source]