Processors

Distribution by Scientific Domains

Kinds of Processors

  • food processors
  • signal processors


  • Selected Abstracts


    The U. S. Navy's Shipboard Solid Waste Program: Managing a Highly Accelerated Fleet-wide Ship System Integration Program

    NAVAL ENGINEERS JOURNAL, Issue 3 2006
    Ye-ling Wang
    The FY-94 and FY-97 National Defense Authorization Acts amended the Act to Prevent Pollution from Ships (APPS) to require fleet-wide installation of Plastics Waste Processors by December 1998, and installation of Pulpers and Shredders by December 2000. These requirements translate to an acquisition program, which could rapidly develop, procure and deploy processing equipment on about 200 surface ships (over 24 ship classes) within three years. To date, this program has successfully completed the ship integration of the Plastic Waste Processor and is in the stages of implementing the Pulpers and Shredders. » A previous paper on this subject broadly described the development of the acquisition strategy, the development of the equipment and the fleet integration plan. This paper focuses on the ownership issues of the program and how they affect program decisions. It reviews how the Navy selected its current compliance strategy, including the consideration of its effects on marine environment through the Environmental Impact Statement (EIS) studies. It details the various processes NAVSEA implemented to ensure the delivery of quality and affordable "turn-key" system to the fleet. These processes included a comprehensive installation design and review; rapid incorporation of lessons learned; timely deployment of Integrated Logistics Support to meet fleet operations to meet fleet introduction which included establishing interim spares; and instituting a comprehensive In-Service Engineering Agent assist and inspection program. [source]


    Producers, Processors and Unions: The Meat Producers Board and Labour Relations in the New Zealand Meat Industry, 1952,1971

    AUSTRALIAN ECONOMIC HISTORY REVIEW, Issue 2 2001
    Bruce Curtis
    In New Zealand, the historical trend towards the rational-capitalistic transformation of agriculture was forestalled in part by producer boards, institutions that were intended to operate in the collective interests of farmers. Recently, there has been renewed interest both in the economic effects of the boards and in the role of farmers themselves within New Zealand's unique arbitral system of industrial relations. This paper bridges these areas of research by examining the influence of the Meat Producers Board on management,labour relations within the export meat industry. Whereas the Board is generally regarded as having empowered family-labour farmers, we argue that its interventions also empowered meatworkers and simultaneously weakened meat-processing companies as employers. The power resources indirectly supplied to meatworkers by the Board were an important external source of union power in the industry. By examining these resources, we identify the neglected effects of a key institution that shaped New Zealand's path of development by preventing the subsumption of ,independent' farming. [source]


    Noise exposures aboard catcher/processor fishing vessels

    AMERICAN JOURNAL OF INDUSTRIAL MEDICINE, Issue 8 2006
    Richard L. Neitzel MS
    Abstract Background Commercial fishing workers have extended work shifts and potential for 24 hr exposures to high noise. However, exposures in this industry have not been adequately characterized. Methods Noise exposures aboard two catcher/processors (C/P) were assessed using dosimetry, sound-level mapping, and self-reported activities and hearing protection device (HPD) use. These data were combined to estimate work shift, non-work, and 24 hr overall exposure levels using several metrics. The length of time during which HPDs were worn was also used to calculate the effective protection received by crew members. Results Nearly all workers had work shift and 24 hr noise levels that exceeded the relevant limits. After HPD use was accounted for, half of the 24 hr exposures remained above relevant limits. Non-work-shift noise contributed nothing to 24 hr exposure levels. HPDs reduced the average exposure by about 10 dBA, but not all workers wore them consistently. Conclusions The primary risk of hearing loss aboard the monitored vessels comes from work shift noise. Smaller vessels or vessels with different layouts may present more risk of hearing damage from non-work periods. Additional efforts are needed to increase use of HPDs or implement noise controls. Am. J. Ind. Med. 2006. © 2006 Wiley-Liss, Inc. [source]


    Audio processors as a learning tool for basic acoustics

    COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, Issue 4 2009
    Arcadi Pejuan
    Abstract Audio processors like Audacity provide a "hear-and-see" learning tool for basic acoustics, combining sound and image. Activities designed as laboratory experiments with PC (even at home!) have been already successfully implemented, for example, about the dependence of timbre and acoustic spectrum, of pitch and frequency, and of loudness and amplitude. © 2009 Wiley Periodicals, Inc. Comput Appl Eng Educ 17: 379,388, 2009; Published online in Wiley InterScience (www.interscience.wiley.com); DOI 10.1002/cae.20207 [source]


    SIMDE: An educational simulator of ILP architectures with dynamic and static scheduling

    COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, Issue 3 2007
    I. Castilla
    Abstract This article presents SIMDE, a cycle-by-cycle simulator to support teaching of Instruction-Level Parallelism (ILP) architectures. The simulator covers dynamic and static instruction scheduling by using a shared structure for both approaches. Dynamic scheduling is illustrated by means of a simple superscalar processor based on Tomasulo's algorithm. A basic Very Long Instruction Word (VLIW) processor has been designed for static scheduling. The simulator is intended as an aid-tool for teaching theoretical contents in Computer Architecture and Organization courses. The students are provided with an easy-to-use common environment to perform different simulations and comparisons between superscalar and VLIW processors. Furthermore, the simulator has been tested by students in a Computer Architecture course in order to assess its real usefulness. © 2007 Wiley Periodicals, Inc. Comput Appl Eng Educ 14: 226,239, 2007; Published online in Wiley InterScience (www.interscience.wiley.com); DOI 10.1002/cae.20154 [source]


    Fast Inverse Reflector Design (FIRD)

    COMPUTER GRAPHICS FORUM, Issue 8 2009
    A. Mas
    I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism; I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling , Physically based modeling; I.3.1 [Hardware architecture]: Graphics processors Abstract This paper presents a new inverse reflector design method using a GPU-based computation of outgoing light distribution from reflectors. We propose a fast method to obtain the outgoing light distribution of a parametrized reflector, and then compare it with the desired illumination. The new method works completely in the GPU. We trace millions of rays using a hierarchical height-field representation of the reflector. Multiple reflections are taken into account. The parameters that define the reflector shape are optimized in an iterative procedure in order for the resulting light distribution to be as close as possible to the desired, user-provided one. We show that our method can calculate reflector lighting at least one order of magnitude faster than previous methods, even with millions of rays, complex geometries and light sources. [source]


    SIMD Optimization of Linear Expressions for Programmable Graphics Hardware

    COMPUTER GRAPHICS FORUM, Issue 4 2004
    Chandrajit Bajaj
    Abstract The increased programmability of graphics hardware allows efficient graphical processing unit (GPU) implementations of a wide range of general computations on commodity PCs. An important factor in such implementations is how to fully exploit the SIMD computing capacities offered by modern graphics processors. Linear expressions in the form of, where A is a matrix, and and are vectors, constitute one of the most basic operations in many scientific computations. In this paper, we propose a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. It is shown that performance can be improved considerably by efficiently packing arithmetic operations into four-wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that the presented technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications, including integrating differential equations and solving a sparse linear system of equations using iterative methods. [source]


    Integration of General Sparse Matrix and Parallel Computing Technologies for Large,Scale Structural Analysis

    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 6 2002
    Hsien Hsieh, Shang
    Both general sparse matrix and parallel computing technologies are integrated in this study as a finite element solution of large,scale structural problems in a PC cluster environment. The general sparse matrix technique is first employed to reduce execution time and storage requirements for solving the simultaneous equilibrium equations in finite element analysis. To further reduce the time required for large,scale structural analyses, two parallel processing approaches for sharing computational workloads among collaborating processors are then investigated. One approach adopts a publicly available parallel equation solver, called SPOOLES, to directly solve the sparse finite element equations, while the other employs a parallel substructure method for the finite element solution. This work focuses more on integrating the general sparse matrix technique and the parallel substructure method for large,scale finite element solutions. Additionally, numerical studies have been conducted on several large,scale structural analyses using a PC cluster to investigate the effectiveness of the general sparse matrix and parallel computing technologies in reducing time and storage requirements in large,scale finite element structural analyses. [source]


    An MPI Parallel Implementation of Newmark's Method

    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 3 2000
    Ali Namazifard
    The standard message-passing interface (MPI) is used to parallelize Newmark's method. The linear matrix equation encountered at each time step is solved using a preconditioned conjugate gradient algorithm. Data are distributed over the processors of a given parallel computer on a degree-of-freedom basis; this produces effective load balance between the processors and leads to a highly parallelized code. The portability of the implementation of this scheme is tested by solving some simple problems on two different machines: an SGI Origin2000 and an IBM SP2. The measured times demonstrate the efficiency of the approach and highlight the maintenance advantages that arise from using a standard parallel library such as MPI. [source]


    Communicating process architecture for multicores,

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 8 2010
    D. May
    Abstract Communicating process architecture can be used to build efficient multicore chips scaling to hundreds of processors. Concurrent processing, communications and input,output are supported directly by the instruction set of the cores and by the protocol used in the on-chip interconnect. Concurrent programs are compiled directly to the chip exploiting novel compiler optimizations. The architecture supports a variety of programming techniques, ranging from statically configured process networks to dynamic reconfiguration and mobile processes. Copyright © 2007 D. May. [source]


    The Scalasca performance toolset architecture

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 6 2010
    Markus Geimer
    Abstract Scalasca is a performance toolset that has been specifically designed to analyze parallel application execution behavior on large-scale systems with many thousands of processors. It offers an incremental performance-analysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. Distinctive features are its ability to identify wait states in applications with very large numbers of processes and to combine these with efficiently summarized local measurements. In this article, we review the current toolset architecture, emphasizing its scalable design and the role of the different components in transforming raw measurement data into knowledge of application execution behavior. The scalability and effectiveness of Scalasca are then surveyed from experience measuring and analyzing real-world applications on a range of computer systems. Copyright © 2010 John Wiley & Sons, Ltd. [source]


    Complex version of high performance computing LINPACK benchmark (HPL)

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 5 2010
    R. F. Barrett
    Abstract This paper describes our effort to enhance the performance of the AORSA fusion energy simulation program through the use of high-performance LINPACK (HPL) benchmark, commonly used in ranking the top 500 supercomputers. The algorithm used by HPL, enhanced by a set of tuning options, is more effective than that found in the ScaLAPACK library. Retrofitting these algorithms, such as look-ahead processing of pivot elements, into ScaLAPACK is considered as a major undertaking. Moreover, HPL is configured as a benchmark, but only for real-valued coefficients. We therefore developed software to convert HPL for use within an application program that generates complex coefficient linear systems. Although HPL is not normally perceived as a part of an application, our results show that the modified HPL software brings a significant increase in the performance of the solver when simulating the highest resolution experiments thus far configured, achieving 87.5 TFLOPS on over 20 000 processors on the Cray XT4. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Exploring the performance of massively multithreaded architectures

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 5 2010
    Shahid Bokhari
    Abstract We present a new scheme for evaluating the performance of multithreaded computers and demonstrate its application to the Cray MTA-2 and XMT supercomputers. Our scheme is based on the concept of clock cycles per element, , plotted against both problem size and the number of processors. This scheme clearly shows if an implementation has achieved its asymptotic efficiency and is more general than (but includes) the commonly used speedup metric. It permits the discovery of any imperfections in both the software as well as the hardware, and is expected to permit a unified comparison of many different parallel architectures. Measurements on a number of well-known parallel algorithms, ranging from matrix multiply to quicksort, are presented for the MTA-2 and XMT and highlight some interesting differences between these machines. The performance of sequence alignment using dynamic programming is evaluated on the MTA-2, XMT, IBM x3755 and SGI Altix 350 and provides a useful comparison of the capabilities of the Cray machines with more conventional shared memory architectures. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Implementation, performance, and science results from a 30.7 TFLOPS IBM BladeCenter cluster

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 2 2010
    Craig A. Stewart
    Abstract This paper describes Indiana University's implementation, performance testing, and use of a large high performance computing system. IU's Big Red, a 20.48 TFLOPS IBM e1350 BladeCenter cluster, appeared in the 27th Top500 list as the 23rd fastest supercomputer in the world in June 2006. In spring 2007, this computer was upgraded to 30.72 TFLOPS. The e1350 BladeCenter architecture, including two internal networks accessible to users and user applications and two networks used exclusively for system management, has enabled the system to provide good scalability on many important applications while being well manageable. Implementing a system based on the JS21 Blade and PowerPC 970MP processor within the US TeraGrid presented certain challenges, given that Intel-compatible processors dominate the TeraGrid. However, the particular characteristics of the PowerPC have enabled it to be highly popular among certain application communities, particularly users of molecular dynamics and weather forecasting codes. A critical aspect of Big Red's implementation has been a focus on Science Gateways, which provide graphical interfaces to systems supporting end-to-end scientific workflows. Several Science Gateways have been implemented that access Big Red as a computational resource,some via the TeraGrid, some not affiliated with the TeraGrid. In summary, Big Red has been successfully integrated with the TeraGrid, and is used by many researchers locally at IU via grids and Science Gateways. It has been a success in terms of enabling scientific discoveries at IU and, via the TeraGrid, across the US. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Scheduling dense linear algebra operations on multicore processors

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 1 2010
    Jakub Kurzak
    Abstract State-of-the-art dense linear algebra software, such as the LAPACK and ScaLAPACK libraries, suffers performance losses on multicore processors due to their inability to fully exploit thread-level parallelism. At the same time, the coarse,grain dataflow model gains popularity as a paradigm for programming multicore architectures. This work looks at implementing classic dense linear algebra workloads, the Cholesky factorization, the QR factorization and the LU factorization, using dynamic data-driven execution. Two emerging approaches to implementing coarse,grain dataflow are examined, the model of nested parallelism, represented by the Cilk framework, and the model of parallelism expressed through an arbitrary Direct Acyclic Graph, represented by the SMP Superscalar framework. Performance and coding effort are analyzed and compared against code manually parallelized at the thread level. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    First experience of compressible gas dynamics simulation on the Los Alamos roadrunner machine

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 17 2009
    Paul R. Woodward
    Abstract We report initial experience with gas dynamics simulation on the Los Alamos Roadrunner machine. In this initial work, we have restricted our attention to flows in which the flow Mach number is less than 2. This permits us to use a simplified version of the PPM gas dynamics algorithm that has been described in detail by Woodward (2006). We follow a multifluid volume fraction using the PPB moment-conserving advection scheme, enforcing both pressure and temperature equilibrium between two monatomic ideal gases within each grid cell. The resulting gas dynamics code has been extensively restructured for efficient multicore processing and implemented for scalable parallel execution on the Roadrunner system. The code restructuring and parallel implementation are described and performance results are discussed. For a modest grid size, sustained performance of 3.89 Gflops,1 CPU-core,1 is delivered by this code on 36 Cell processors in 9 triblade nodes of a single rack of Roadrunner hardware. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    An architecture for exploiting multi-core processors to parallelize network intrusion prevention

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 10 2009
    Robin Sommer
    Abstract It is becoming increasingly difficult to implement effective systems for preventing network attacks, due to the combination of the rising sophistication of attacks requiring more complex analyses to detect; the relentless growth in the volume of network traffic that we must analyze; and, critically, the failure in recent years for uniprocessor performance to sustain the exponential gains that for so many years CPUs have enjoyed. For commodity hardware, tomorrow's performance gains will instead come from multi-core architectures in which a whole set of CPUs executes concurrently. Taking advantage of the full power of multi-core processors for network intrusion prevention requires an in-depth approach. In this work we frame an architecture customized for parallel execution of network attack analysis. At the lowest layer of the architecture is an ,Active Network Interface', a custom device based on an inexpensive FPGA platform. The analysis itself is structured as an event-based system, which allows us to find many opportunities for concurrent execution, since events introduce a natural asynchrony into the analysis while still maintaining good cache locality. A preliminary evaluation demonstrates the potential of this architecture. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Concurrent workload mapping for multicore security systems

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 10 2009
    Benfano Soewito
    Abstract Multicore based network processors are promising components to build real-time and scalable security systems to protect the networks and systems. The parallel nature of the processing system makes it challenging for application developers to concurrently program security systems for high performance. In this paper we present an automatic programming methodology that considers application complexity, traffic variation, and attack signatures update. In particular, our mapping algorithm concurrently takes advantage of parallelism in the level of tasks, applications, and packets to achieve optimal performance. We present results that show the effectiveness of the analysis, mapping, and the performance of the model methodology. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Factors affecting the performance of parallel mining of minimal unique itemsets on diverse architectures

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 9 2009
    D. J. Haglin
    Abstract Three parallel implementations of a divide-and-conquer search algorithm (called SUDA2) for finding minimal unique itemsets (MUIs) are compared in this paper. The identification of MUIs is used by national statistics agencies for statistical disclosure assessment. The first parallel implementation adapts SUDA2 to a symmetric multi-processor cluster using the message passing interface (MPI), which we call an MPI cluster; the second optimizes the code for the Cray MTA2 (a shared-memory, multi-threaded architecture) and the third uses a heterogeneous ,group' of workstations connected by LAN. Each implementation considers the parallel structure of SUDA2, and how the subsearch computation times and sequence of subsearches affect load balancing. All three approaches scale with the number of processors, enabling SUDA2 to handle larger problems than before. For example, the MPI implementation is able to achieve nearly two orders of magnitude improvement with 132 processors. Performance results are given for a number of data sets. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Reparallelization techniques for migrating OpenMP codes in computational grids

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 3 2009
    Michael Klemm
    Typical computational grid users target only a single cluster and have to estimate the runtime of their jobs. Job schedulers prefer short-running jobs to maintain a high system utilization. If the user underestimates the runtime, premature termination causes computation loss; overestimation is penalized by long queue times. As a solution, we present an automatic reparallelization and migration of OpenMP applications. A reparallelization is dynamically computed for an OpenMP work distribution when the number of CPUs changes. The application can be migrated between clusters when an allocated time slice is exceeded. Migration is based on a coordinated, heterogeneous checkpointing algorithm. Both reparallelization and migration enable the user to freely use computing time at more than a single point of the grid. Our demo applications successfully adapt to the changed CPU setting and smoothly migrate between, for example, clusters in Erlangen, Germany, and Amsterdam, the Netherlands, that use different kinds and numbers of processors. Benchmarks show that reparallelization and migration impose average overheads of about 4 and 2%, respectively. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    Parallel tiled QR factorization for multicore architectures

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 13 2008
    Alfredo Buttari
    Abstract As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine-grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data (referred to as ,tiles'). These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out-of-order execution of the tasks that will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    An adaptive extension library for improving collective communication operations

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 10 2008
    O. Hartmann
    Abstract In this paper, we present an adaptive extension library that combines the advantage of using a portable MPI library with the ability to optimize the performance of specific collective communication operations. The extension library is built on top of MPI and can be used with any MPI library. Using the extension library, performance improvements can be achieved by an orthogonal organization of the processors in 2D or 3D meshes and by decomposing the collective communication operations into several consecutive phases of MPI communication. Additional point-to-point-based algorithms are also provided. The extension library works in two steps, an a priori configuration phase detecting possible improvements for implementing collective communication for the MPI library used and an execution phase selecting a better implementation during execution time. This allows an adaptation of the performance of MPI programs to a specific execution platform and communication situation. The experimental evaluation shows that significant performance improvements can be obtained for different MPI libraries by using the library extension for collective MPI communication operations in isolation as well as in the context of application programs. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Parallel space-filling curve generation through sorting

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 10 2007
    J. Luitjens
    Abstract In this paper we consider the scalability of parallel space-filling curve generation as implemented through parallel sorting algorithms. Multiple sorting algorithms are studied and results show that space-filling curves can be generated quickly in parallel on thousands of processors. In addition, performance models are presented that are consistent with measured performance and offer insight into performance on still larger numbers of processors. At large numbers of processors, the scalability of adaptive mesh refined codes depends on the individual components of the adaptive solver. One such component is the dynamic load balancer. In adaptive mesh refined codes, the mesh is constantly changing resulting in load imbalance among the processors requiring a load-balancing phase. The load balancing may occur often, requiring the load balancer to perform quickly. One common method for dynamic load balancing is to use space-filling curves. Space-filling curves, in particular the Hilbert curve, generate good partitions quickly in serial. However, at tens and hundreds of thousands of processors serial generation of space-filling curves will hinder scalability. In order to avoid this issue we have developed a method that generates space-filling curves quickly in parallel by reducing the generation to integer sorting. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Parallelization and scalability of a spectral element channel flow solver for incompressible Navier,Stokes equations

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 10 2007
    C. W. Hamman
    Abstract Direct numerical simulation (DNS) of turbulent flows is widely recognized to demand fine spatial meshes, small timesteps, and very long runtimes to properly resolve the flow field. To overcome these limitations, most DNS is performed on supercomputing machines. With the rapid development of terascale (and, eventually, petascale) computing on thousands of processors, it has become imperative to consider the development of DNS algorithms and parallelization methods that are capable of fully exploiting these massively parallel machines. A highly parallelizable algorithm for the simulation of turbulent channel flow that allows for efficient scaling on several thousand processors is presented. A model that accurately predicts the performance of the algorithm is developed and compared with experimental data. The results demonstrate that the proposed numerical algorithm is capable of scaling well on petascale computing machines and thus will allow for the development and analysis of high Reynolds number channel flows. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    MRMOGA: a new parallel multi-objective evolutionary algorithm based on the use of multiple resolutions

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 4 2007
    Antonio López Jaimes
    Abstract In this paper, we introduce MRMOGA (Multiple Resolution Multi-Objective Genetic Algorithm), a new parallel multi-objective evolutionary algorithm which is based on an injection island approach. This approach is characterized by adopting an encoding of solutions which uses a different resolution for each island. This approach allows us to divide the decision variable space into well-defined overlapped regions to achieve an efficient use of multiple processors. Also, this approach guarantees that the processors only generate solutions within their assigned region. In order to assess the performance of our proposed approach, we compare it to a parallel version of an algorithm that is representative of the state-of-the-art in the area, using standard test functions and performance measures reported in the specialized literature. Our results indicate that our proposed approach is a viable alternative to solve multi-objective optimization problems in parallel, particularly when dealing with large search spaces. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    Performance of computationally intensive parameter sweep applications on Internet-based Grids of computers: the mapping of molecular potential energy hypersurfaces

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 4 2007
    S. Reyes
    Abstract This work focuses on the use of computational Grids for processing the large set of jobs arising in parameter sweep applications. In particular, we tackle the mapping of molecular potential energy hypersurfaces. For computationally intensive parameter sweep problems, performance models are developed to compare the parallel computation in a multiprocessor system with the computation on an Internet-based Grid of computers. We find that the relative performance of the Grid approach increases with the number of processors, being independent of the number of jobs. The experimental data, obtained using electronic structure calculations, fit the proposed performance expressions accurately. To automate the mapping of potential energy hypersurfaces, an application based on GRID superscalar is developed. It is tested on the prototypical case of the internal dynamics of acetone. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    Experimental analysis of a mass storage system

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 15 2006
    Shahid Bokhari
    Abstract Mass storage systems (MSSs) play a key role in data-intensive parallel computing. Most contemporary MSSs are implemented as redundant arrays of independent/inexpensive disks (RAID) in which commodity disks are tied together with proprietary controller hardware. The performance of such systems can be difficult to predict because most internal details of the controller behavior are not public. We present a systematic method for empirically evaluating MSS performance by obtaining measurements on a series of RAID configurations of increasing size and complexity. We apply this methodology to a large MSS at Ohio Supercomputer Center that has 16 input/output processors, each connected to four 8 + 1 RAID5 units and provides 128 TB of storage (of which 116.8 TB are usable when formatted). Our methodology permits storage-system designers to evaluate empirically the performance of their systems with considerable confidence. Although we have carried out our experiments in the context of a specific system, our methodology is applicable to all large MSSs. The measurements obtained using our methods permit application programmers to be aware of the limits to the performance of their codes. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    An efficient concurrent implementation of a neural network algorithm

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 12 2006
    R. Andonie
    Abstract The focus of this study is how we can efficiently implement the neural network backpropagation algorithm on a network of computers (NOC) for concurrent execution. We assume a distributed system with heterogeneous computers and that the neural network is replicated on each computer. We propose an architecture model with efficient pattern allocation that takes into account the speed of processors and overlaps the communication with computation. The training pattern set is distributed among the heterogeneous processors with the mapping being fixed during the learning process. We provide a heuristic pattern allocation algorithm minimizing the execution time of backpropagation learning. The computations are overlapped with communications. Under the condition that each processor has to perform a task directly proportional to its speed, this allocation algorithm has polynomial-time complexity. We have implemented our model on a dedicated network of heterogeneous computers using Sejnowski's NetTalk benchmark for testing. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    Parallel divide-and-conquer scheme for 2D Delaunay triangulation

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 12 2006
    Min-Bin Chen
    Abstract This work describes a parallel divide-and-conquer Delaunay triangulation scheme. This algorithm finds the affected zone, which covers the triangulation and may be modified when two sub-block triangulations are merged. Finding the affected zone can reduce the amount of data required to be transmitted between processors. The time complexity of the divide-and-conquer scheme remains O(n log n), and the affected region can be located in O(n) time steps, where n denotes the number of points. The code was implemented with C, FORTRAN and MPI, making it portable to many computer systems. Experimental results on an IBM SP2 show that a parallel efficiency of 44,95% for general distributions can be attained on a 16-node distributed memory system. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    Optimal integrated code generation for VLIW architectures

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 11 2006
    Christoph Kessler
    Abstract We present a dynamic programming method for optimal integrated code generation for basic blocks that minimizes execution time. It can be applied to single-issue pipelined processors, in-order-issue superscalar processors, VLIW architectures with a single homogeneous register set, and clustered VLIW architectures with multiple register sets. For the case of a single register set, our method simultaneously copes with instruction selection, instruction scheduling, and register allocation. For clustered VLIW architectures, we also integrate the optimal partitioning of instructions, allocation of registers for temporary variables, and scheduling of data transfer operations between clusters. Our method is implemented in the prototype of a retargetable code generation framework for digital signal processors (DSPs), called OPTIMIST. We present results for the processors ARM9E, TI C62x, and a single-cluster variant of C62x. Our results show that the method can produce optimal solutions for small and (in the case of a single register set) medium-sized problem instances with a reasonable amount of time and space. For larger problem instances, our method can be seamlessly changed into a heuristic. Copyright © 2006 John Wiley & Sons, Ltd. [source]