Efficient Implementations (efficient + implementations)

Distribution by Scientific Domains


Selected Abstracts


Data partitioning-based parallel irregular reductions

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 2-3 2004
Eladio Gutiérrez
Abstract Different parallelization methods for irregular reductions on shared memory multiprocessors have been proposed in the literature in recent years. We have classified all these methods and analyzed them in terms of a set of properties: data locality, memory overhead, exploited parallelism, and workload balancing. In this paper we propose several techniques to increase the amount of exploited parallelism and to introduce load balancing into an important class of these methods. Regarding parallelism, the proposed solution is based on the partial expansion of the reduction array. Load balancing is discussed in terms of two techniques. The first technique is a generic one, as it deals with any kind of load imbalance present in the problem domain. The second technique handles a special case of load imbalance which occurs whenever a large number of write operations are concentrated on small regions of the reduction arrays. Efficient implementations of the proposed optimizing solutions for a particular method are presented, experimentally tested on static and dynamic kernel codes, and compared with other parallel reduction methods. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Direct, partitioned and projected solution to finite element consolidation models

INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 14 2002
Giuseppe Gambolati
Abstract Direct, partitioned, and projected (conjugate gradient-like) solution approaches are compared on unsymmetric indefinite systems arising from the finite element integration of coupled consolidation equations. The direct method is used in its most recent and computationally efficient implementations of the Harwell Software Library. The partitioned approach designed for coupled problems is especially attractive as it addresses two separate positive definite problems of a smaller size that can be solved by symmetric conjugate gradients. However, it may stagnate and when converging it does not prove competitive with a global projection method such as Bi-CGSTAB, which may take full advantage of its flexibility in working on scaled and reordered equations, and thus may greatly improve its computational performance in terms of both robustness and convergence rate. The Bi-CGSTAB superiority to the other approaches is discussed and demonstrated with a few representative examples in two-dimensional (2-D) and three-dimensional (3-D) coupled consolidation problems. Copyright © 2002 John Wiley & Sons, Ltd. [source]


Minimal cycle basis of graph products for the force method of frame analysis

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 8 2008
A. Kaveh
Abstract For an efficient force method of frame analysis, the formation of localized self-equilibrating systems is an important issue. Such systems can be constructed on minimal cycle basis of the graph model of the structure. In this paper, algorithms are presented for the formation of minimal cycle bases of graph products corresponding to sparse cycle adjacency matrices, leading to the formation of highly sparse flexibility matrices. The algorithms presented employ concepts from three graph products namely Cartesian, strong Cartesian and lexicographic products. Though the formulation for the first two products exist, however, efficient implementations are made in this paper. The formulation for the generation of minimal cycle basis is extended to the lexicographic product. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Performance of multi level-turbo coding with neural network-based channel estimation over WSSUS MIMO channels

INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 3 2009
Ersin Gose
Abstract This paper presents the performance of the transmit diversity-multi level turbo codes (TD-MLTC) over the multiple-input,multiple-output (MIMO) channels based on the wide sense stationary uncorrelated scattering (WSSUS). The multi level-turbo code (ML-TC) system contains more than one turbo encoder/decoder block in its structure. At the transmitter side, the ML-TC uses the group partitioning technique that partitions a signal set into several levels and encodes each level separately by a proper component of the encoder to improve error performance. The binary input sequence is passed through the MLTC encoder and mapped to 4-PSK and then fed into the transmit diversity scheme for high data transmission over wireless fading channels. At the receiver side, distorted multi-path signals are received by multiple receiver antennae. WSSUS MIMO channel parameters are estimated by using an artificial neural network and an iterative combiner. Input sequence of the first level of the MLTC encoder is estimated at the first level of MLTC decoder. Subsequently, the other input sequences are computed by using the estimated input bit streams of the previous levels. 4-PSK two-level turbo codes are simulated for 2Tx,1Rx and 2Tx,2Rx antenna configurations over WSSUS MIMO channels. Here, TD-MLTC and its efficient implementations are discussed and simulation results are given. Copyright © 2008 John Wiley & Sons, Ltd. [source]