Parallel Algorithms (parallel + algorithms)

Distribution by Scientific Domains


Selected Abstracts


Parallel Algorithms for Dynamic Shortest Path Problems

INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, Issue 3 2002
Ismail Chabini
The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster-than-real-time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst-case running-time complexity. This implies that no algorithm with a better worst-case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all-to-one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially-available high-performance computing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared-memory and two message-passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decomposition by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message-passing environment based on the parallel virtual machine (PVM) library and a multi-threading environment based on the SUN Microsystems Multi-Threads (MT) library. We also develop a time-based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an ,ideal' theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared-memory machine containing eight processors. Satisfactory speed-ups in the running time of sequential algorithms are achieved, in particular for shared-memory machines. Numerical results indicate that shared-memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real-time ITS applications. [source]


Fast BVH Construction on GPUs

COMPUTER GRAPHICS FORUM, Issue 2 2009
C. Lauterbach
We present two novel parallel algorithms for rapidly constructing bounding volume hierarchies on manycore GPUs. The first uses a linear ordering derived from spatial Morton codes to build hierarchies extremely quickly and with high parallel scalability. The second is a top-down approach that uses the surface area heuristic (SAH) to build hierarchies optimized for fast ray tracing. Both algorithms are combined into a hybrid algorithm that removes existing bottlenecks in the algorithm for GPU construction performance and scalability leading to significantly decreased build time. The resulting hierarchies are close in to optimized SAH hierarchies, but the construction process is substantially faster, leading to a significant net benefit when both construction and traversal cost are accounted for. Our preliminary results show that current GPU architectures can compete with CPU implementations of hierarchy construction running on multicore systems. In practice, we can construct hierarchies of models with up to several million triangles and use them for fast ray tracing or other applications. [source]


Exploring the performance of massively multithreaded architectures

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 5 2010
Shahid Bokhari
Abstract We present a new scheme for evaluating the performance of multithreaded computers and demonstrate its application to the Cray MTA-2 and XMT supercomputers. Our scheme is based on the concept of clock cycles per element, , plotted against both problem size and the number of processors. This scheme clearly shows if an implementation has achieved its asymptotic efficiency and is more general than (but includes) the commonly used speedup metric. It permits the discovery of any imperfections in both the software as well as the hardware, and is expected to permit a unified comparison of many different parallel architectures. Measurements on a number of well-known parallel algorithms, ranging from matrix multiply to quicksort, are presented for the MTA-2 and XMT and highlight some interesting differences between these machines. The performance of sequence alignment using dynamic programming is evaluated on the MTA-2, XMT, IBM x3755 and SGI Altix 350 and provides a useful comparison of the capabilities of the Cray machines with more conventional shared memory architectures. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Parallel processing of remotely sensed hyperspectral imagery: full-pixel versus mixed-pixel classification

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 13 2008
Antonio J. Plaza
Abstract The rapid development of space and computer technologies allows for the possibility to store huge amounts of remotely sensed image data, collected using airborne and satellite instruments. In particular, NASA is continuously gathering high-dimensional image data with Earth observing hyperspectral sensors such as the Jet Propulsion Laboratory's airborne visible,infrared imaging spectrometer (AVIRIS), which measures reflected radiation in hundreds of narrow spectral bands at different wavelength channels for the same area on the surface of the Earth. The development of fast techniques for transforming massive amounts of hyperspectral data into scientific understanding is critical for space-based Earth science and planetary exploration. Despite the growing interest in hyperspectral imaging research, only a few efforts have been devoted to the design of parallel implementations in the literature, and detailed comparisons of standardized parallel hyperspectral algorithms are currently unavailable. This paper compares several existing and new parallel processing techniques for pure and mixed-pixel classification in hyperspectral imagery. The distinction of pure versus mixed-pixel analysis is linked to the considered application domain, and results from the very rich spectral information available from hyperspectral instruments. In some cases, such information allows image analysts to overcome the constraints imposed by limited spatial resolution. In most cases, however, the spectral bands collected by hyperspectral instruments have high statistical correlation, and efficient parallel techniques are required to reduce the dimensionality of the data while retaining the spectral information that allows for the separation of the classes. In order to address this issue, this paper also develops a new parallel feature extraction algorithm that integrates the spatial and spectral information. The proposed technique is evaluated (from the viewpoint of both classification accuracy and parallel performance) and compared with other parallel techniques for dimensionality reduction and classification in the context of three representative application case studies: urban characterization, land-cover classification in agriculture, and mapping of geological features, using AVIRIS data sets with detailed ground-truth. Parallel performance is assessed using Thunderhead, a massively parallel Beowulf cluster at NASA's Goddard Space Flight Center. The detailed cross-validation of parallel algorithms conducted in this work may specifically help image analysts in selection of parallel algorithms for specific applications. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Sequence alignment on the Cray MTA-2,

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 9 2004
Shahid H. Bokhari
Abstract Several variants of standard algorithms for DNA sequence alignment have been implemented on the Cray Multithreaded Architecture-2 (MTA-2). We describe the architecture of the MTA-2 and discuss how its hardware and software enable efficient implementation of parallel algorithms with little or no regard for issues of partitioning, mapping or scheduling. We describe how we ported variants of the naive algorithm for exact alignment and the dynamic programming algorithm for approximate alignment to the MTA-2 and provide detailed performance measurements. It is shown that, for the dynamic programming algorithm, the use of the MTA's ,Full/Empty' synchronization bits leads to almost perfect speedup for large problems on one to eight processors. These results illustrate the versatility of the MTA's architecture and demonstrate its potential for providing a high-productivity platform for parallel processing. Copyright © 2004 John Wiley & Sons, Ltd. [source]


A parallel Broyden approach to the Toeplitz inverse eigenproblem

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 6 2004
Jesús Peinado
Abstract In this work we show a portable sequential and a portable parallel algorithm for solving the inverse eigenproblem for real symmetric Toeplitz matrices. Both algorithms are based on Broyden's method for solving nonlinear systems. We reduced the computational cost for some problem sizes, and furthermore we managed to reduce spatial cost considerably, compared in both cases with parallel algorithms proposed by other authors and by us, although sometimes quasi-Newton methods (as Broyden) do not reach convergence in all the test cases. We have implemented the parallel algorithm using the parallel numerical linear algebra library SCALAPACK based on the MPI environment. Experimental results have been obtained using two different architectures: a shared memory multiprocessor, the SGI PowerChallenge, and a cluster of Pentium II PCs connected through a myrinet network. The algorithms obtained are scalable in all the cases. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Third-order methods for first-order hyperbolic partial differential equations

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 1 2004
T. A. Cheema
Abstract In this paper numerical methods for solving first-order hyperbolic partial differential equations are developed. These methods are developed by approximating the first-order spatial derivative by third-order finite-difference approximations and a matrix exponential function by a third-order rational approximation having distinct real poles. Then parallel algorithms are developed and tested on a sequential computer for an advection equation with constant coefficient and a non-linear problem. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Distance-two interpolation for parallel algebraic multigrid

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 2-3 2008
Hans De Sterck
Abstract Algebraic multigrid (AMG) is one of the most efficient and scalable parallel algorithms for solving sparse linear systems on unstructured grids. However, for large 3D problems, the coarse grids that are normally used in AMG often lead to growing complexity in terms of memory use and execution time per AMG V-cycle. Sparser coarse grids, such as those obtained by the parallel modified independent set (PMIS) coarsening algorithm, remedy this complexity growth but lead to nonscalable AMG convergence factors when traditional distance-one interpolation methods are used. In this paper, we study the scalability of AMG methods that combine PMIS coarse grids with long-distance interpolation methods. AMG performance and scalability are compared for previously introduced interpolation methods as well as new variants of them for a variety of relevant test problems on parallel computers. It is shown that the increased interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Modifying CLJP to select grid hierarchies with lower operator complexities and better performance

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, Issue 2-3 2006
David M. Alber
Abstract Algebraic multigrid (AMG) is an efficient algorithm for solving certain types of large, sparse linear systems. For solving very large problems with AMG it becomes necessary to use parallel algorithms. Coarse grid selection algorithms such as CLJP were created to parallelize the setup phase of AMG. For some problems, such as those discretized on structured meshes, CLJP tends to select coarse grids with more nodes than alternative coarsening algorithms. In this paper, the cause for the selection of too many coarse nodes by CLJP is examined, and a new technique which lowers the operator complexities generated by CLJP is introduced. To validate the new method, the modified CLJP is compared to other coarsening algorithms for large-scale problems. Copyright © 2006 John Wiley & Sons, Ltd. [source]