Home About us Contact

Parallel Performance (parallel + performance)

Distribution by Scientific Domains

Engineering	66%

Selected Abstracts

Parallel processing of remotely sensed hyperspectral imagery: full-pixel versus mixed-pixel classification

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 13 2008
Antonio J. Plaza
Abstract The rapid development of space and computer technologies allows for the possibility to store huge amounts of remotely sensed image data, collected using airborne and satellite instruments. In particular, NASA is continuously gathering high-dimensional image data with Earth observing hyperspectral sensors such as the Jet Propulsion Laboratory's airborne visible,infrared imaging spectrometer (AVIRIS), which measures reflected radiation in hundreds of narrow spectral bands at different wavelength channels for the same area on the surface of the Earth. The development of fast techniques for transforming massive amounts of hyperspectral data into scientific understanding is critical for space-based Earth science and planetary exploration. Despite the growing interest in hyperspectral imaging research, only a few efforts have been devoted to the design of parallel implementations in the literature, and detailed comparisons of standardized parallel hyperspectral algorithms are currently unavailable. This paper compares several existing and new parallel processing techniques for pure and mixed-pixel classification in hyperspectral imagery. The distinction of pure versus mixed-pixel analysis is linked to the considered application domain, and results from the very rich spectral information available from hyperspectral instruments. In some cases, such information allows image analysts to overcome the constraints imposed by limited spatial resolution. In most cases, however, the spectral bands collected by hyperspectral instruments have high statistical correlation, and efficient parallel techniques are required to reduce the dimensionality of the data while retaining the spectral information that allows for the separation of the classes. In order to address this issue, this paper also develops a new parallel feature extraction algorithm that integrates the spatial and spectral information. The proposed technique is evaluated (from the viewpoint of both classification accuracy and parallel performance) and compared with other parallel techniques for dimensionality reduction and classification in the context of three representative application case studies: urban characterization, land-cover classification in agriculture, and mapping of geological features, using AVIRIS data sets with detailed ground-truth. Parallel performance is assessed using Thunderhead, a massively parallel Beowulf cluster at NASA's Goddard Space Flight Center. The detailed cross-validation of parallel algorithms conducted in this work may specifically help image analysts in selection of parallel algorithms for specific applications. Copyright © 2008 John Wiley & Sons, Ltd. [source]

Parallel solution of lifting rotors in hover and forward flight

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 1 2007
C. B. Allen
Abstract An implicit unsteady, multiblock, multigrid, upwind solver including mesh deformation capability, and structured multiblock grid generator, are presented and applied to lifting rotors in both hover and forward flight. To allow the use of very fine meshes and, hence, better representation of the flow physics, a parallel version of the code has been developed. It is demonstrated that once the grid density is sufficient to capture enough turns of the tip vortices, hover exhibits oscillatory behaviour of the wake, even using a steady formulation. An unsteady simulation is then presented, and detailed analysis of the time-accurate wake history is performed and compared to theoretical predictions. Forward flight simulations are also presented and, again, grid density effects on the wake formation investigated. Parallel performance of the code using up to 1024 CPU's is also presented. Copyright © 2006 John Wiley & Sons, Ltd. [source]

Parallel processing of remotely sensed hyperspectral imagery: full-pixel versus mixed-pixel classification

Parallel operation of CartaBlanca on shared and distributed memory computers

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 1 2004
N. T. Padial-Collins
Abstract We describe the parallel performance of the pure Java CartaBlanca code on heat transfer and multiphase fluid flow problems. CartaBlanca is designed for parallel computations on partitioned unstructured meshes. It uses Java's thread facility to manage computations on each of the mesh partitions. Inter-partition communications are handled by two compact objects for node-by-node communication along partition boundaries and for global reduction calculations across the entire mesh. For distributed calculations, the JavaParty package from the University of Karlsruhe is demonstrated to work with CartaBlanca. Copyright © 2004 John Wiley & Sons, Ltd. [source]

ParCYCLIC: finite element modelling of earthquake liquefaction response on parallel computers

INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 12 2004
Jun Peng
Abstract This paper presents the computational procedures and solution strategy employed in ParCYCLIC, a parallel non-linear finite element program developed based on an existing serial code CYCLIC for the analysis of cyclic seismically-induced liquefaction problems. In ParCYCLIC, finite elements are employed within an incremental plasticity, coupled solid,fluid formulation. A constitutive model developed for simulating liquefaction-induced deformations is a main component of this analysis framework. The elements of the computational strategy, designed for distributed-memory message-passing parallel computer systems, include: (a) an automatic domain decomposer to partition the finite element mesh; (b) nodal ordering strategies to minimize storage space for the matrix coefficients; (c) an efficient scheme for the allocation of sparse matrix coefficients among the processors; and (d) a parallel sparse direct solver. Application of ParCYCLIC to simulate 3-D geotechnical experimental models is demonstrated. The computational results show excellent parallel performance and scalability of ParCYCLIC on parallel computers with a large number of processors. Copyright © 2004 John Wiley & Sons, Ltd. [source]

Application of the additive Schwarz method to large scale Poisson problems

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 3 2004
K. M. Singh
Abstract This paper presents an application of the additive Schwarz method to large scale Poisson problems on parallel computers. Domain decomposition in rectangular blocks with matching grids on a structured rectangular mesh has been used together with a stepwise approximation to approximate sloping sides and complicated geometric features. A seven-point stencil based on central difference scheme has been used for the discretization of the Laplacian for both interior and boundary grid points, and this results in a symmetric linear algebraic system for any type of boundary conditions. The preconditioned conjugate gradient method has been used as an accelerator for the additive Schwarz method, and three different methods have been assessed for the solution of subdomain problems. Numerical experiments have been performed to determine the most suitable set of subdomain solvers and the optimal accuracy of subdomain solutions; to assess the effect of different decompositions of the problem domain; and to evaluate the parallel performance of the additive Schwarz preconditioner. Application to a practical problem involving complicated geometry is presented which establishes the efficiency and robustness of the method. Copyright © 2004 John Wiley & Sons, Ltd. [source]

Parallel simulation of unsteady hovering rotor wakes

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 6 2006
C. B. Allen
Abstract Numerical simulation using low diffusion schemes, for example free-vortex or vorticity transport methods, and theoretical stability analyses have shown the wakes of rotors in hover to be unsteady. This has also been observed in experiments, although the instabilities are not always repeatable. Hovering rotor wake stability is considered here using a finite-volume compressible CFD code. An implicit unsteady, multiblock, multigrid, upwind solver, and structured multiblock grid generator are presented, and applied to lifting rotors in hover. To allow the use of very fine meshes and, hence, better representation of the flow physics, a parallel version of the code has been developed, and parallel performance using upto 1024 CPUs is presented. A four-bladed rotor is considered, and it is demonstrated that once the grid density is sufficient to capture enough turns of the tip vortices, hover exhibits oscillatory behaviour of the wake, even using a steady formulation. An unsteady simulation is then performed, and also shows an unsteady wake. Detailed analysis of the time-accurate wake history shows that three dominant unsteady modes are captured, for this four-bladed case, with frequencies of one, four, and eight times the rotational frequency. A comparison with theoretical stability analysis is also presented. Copyright © 2006 John Wiley & Sons, Ltd. [source]

Parallel DSMC method using dynamic domain decomposition

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 1 2005
J.-S. Wu
Abstract A general parallel direct simulation Monte Carlo method using unstructured mesh is introduced, which incorporates a multi-level graph-partitioning technique to dynamically decompose the computational domain. The current DSMC method is implemented on an unstructured mesh using particle ray-tracing technique, which takes the advantages of the cell connectivity information. In addition, various strategies applying the stop at rise (SAR) (IEEE Trans Comput 1988; 39:1073,1087) scheme is studied to determine how frequent the domain should be re-decomposed. A high-speed, bottom-driven cavity flow, including small, medium and large problems, based on the number of particles and cells, are simulated. Corresponding analysis of parallel performance is reported on IBM-SP2 parallel machine up to 64 processors. Analysis shows that degree of imbalance among processors with dynamic load balancing is about ,,½ of that without dynamic load balancing. Detailed time analysis shows that degree of imbalance levels off very rapidly at a relatively low value with increasing number of processors when applying dynamic load balancing, which makes the large problem size fairly scalable for processors more than 64. In general, optimal frequency of activating SAR scheme decreases with problem size. At the end, the method is applied to compute two two-dimensional hypersonic flows, a three-dimensional hypersonic flow and a three-dimensional near-continuum twin-jet gas flow to demonstrate its superior computational capability and compare with experimental data and previous simulation data wherever available. Copyright © 2005 John Wiley & Sons, Ltd. [source]

A collocated, iterative fractional-step method for incompressible large eddy simulation

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 4 2008
Giridhar Jothiprasad
Abstract Fractional-step methods are commonly used for the time-accurate solution of incompressible Navier,Stokes (NS) equations. In this paper, a popular fractional-step method that uses pressure corrections in the projection step and its iterative variants are investigated using block-matrix analysis and an improved algorithm with reduced computational cost is developed. Since the governing equations for large eddy simulation (LES) using linear eddy-viscosity-based sub-grid models are similar in form to the incompressible NS equations, the improved algorithm is implemented in a parallel LES solver. A collocated grid layout is preferred for ease of extension to curvilinear grids. The analyzed fractional-step methods are viewed as an iterative approximation to a temporally second-order discretization. At each iteration, a linear system that has an easier block-LU decomposition compared with the original system is inverted. In order to improve the numerical efficiency and parallel performance, modified ADI sub-iterations are used in the velocity step of each iteration. Block-matrix analysis is first used to determine the number of iterations required to reduce the iterative error to the discretization error of. Next, the computational cost is reduced through the use of a reduced stencil for the pressure Poisson equation (PPE). Energy-conserving, spatially fourth-order discretizations result in a 7-point stencil in each direction for the PPE. A smaller 5-point stencil is achieved by using a second-order spatial discretization for the pressure gradient operator correcting the volume fluxes. This is shown not to reduce the spatial accuracy of the scheme, and a fourth-order continuity equation is still satisfied to machine precision. The above results are verified in three flow problems including LES of a temporal mixing layer. Copyright © 2008 John Wiley & Sons, Ltd. [source]