Processor

Distribution by Scientific Domains

Kinds of Processor

  • digital signal processor
  • signal processor
  • single processor


  • Selected Abstracts


    Near real-time, autonomous detection of marine bacterioplankton on a coastal mooring in Monterey Bay, California, using rRNA-targeted DNA probes

    ENVIRONMENTAL MICROBIOLOGY, Issue 5 2009
    Christina M. Preston
    Summary A sandwich hybridization assay (SHA) was developed to detect 16S rRNAs indicative of phylogenetically distinct groups of marine bacterioplankton in a 96-well plate format as well as low-density arrays printed on a membrane support. The arrays were used in a field-deployable instrument, the Environmental Sample Processor (ESP). The SHA employs a chaotropic buffer for both cell homogenization and hybridization, thus target sequences are captured directly from crude homogenates. Capture probes for seven of nine different bacterioplankton clades examined reacted specifically when challenged with target and non-target 16S rRNAs derived from in vitro transcribed 16S rRNA genes cloned from natural samples. Detection limits were between 0.10,1.98 and 4.43, 12.54 fmole ml,1 homogenate for the 96-well plate and array SHA respectively. Arrays printed with five of the bacterioplankton-specific capture probes were deployed on the ESP in Monterey Bay, CA, twice in 2006 for a total of 25 days and also utilized in a laboratory time series study. Groups detected included marine alphaproteobacteria, SAR11, marine cyanobacteria, marine group I crenarchaea, and marine group II euryarchaea. To our knowledge this represents the first report of remote in situ DNA probe-based detection of marine bacterioplankton. [source]


    The U. S. Navy's Shipboard Solid Waste Program: Managing a Highly Accelerated Fleet-wide Ship System Integration Program

    NAVAL ENGINEERS JOURNAL, Issue 3 2006
    Ye-ling Wang
    The FY-94 and FY-97 National Defense Authorization Acts amended the Act to Prevent Pollution from Ships (APPS) to require fleet-wide installation of Plastics Waste Processors by December 1998, and installation of Pulpers and Shredders by December 2000. These requirements translate to an acquisition program, which could rapidly develop, procure and deploy processing equipment on about 200 surface ships (over 24 ship classes) within three years. To date, this program has successfully completed the ship integration of the Plastic Waste Processor and is in the stages of implementing the Pulpers and Shredders. A previous paper on this subject broadly described the development of the acquisition strategy, the development of the equipment and the fleet integration plan. This paper focuses on the ownership issues of the program and how they affect program decisions. It reviews how the Navy selected its current compliance strategy, including the consideration of its effects on marine environment through the Environmental Impact Statement (EIS) studies. It details the various processes NAVSEA implemented to ensure the delivery of quality and affordable "turn-key" system to the fleet. These processes included a comprehensive installation design and review; rapid incorporation of lessons learned; timely deployment of Integrated Logistics Support to meet fleet operations to meet fleet introduction which included establishing interim spares; and instituting a comprehensive In-Service Engineering Agent assist and inspection program. [source]


    Modular hardware design for distant-internet embedded systems engineering laboratory

    COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, Issue 4 2009
    Xicai Yue
    Abstract A novel hardware system providing remote accessible embedded system experiments of microcontroller, digital signal processor (DSP), and field-programmable gate array (FPGA) via Internet is presented. Three newly developed hardware modules for integrating experimental boards, experimental instrumentation, and a PC workstation in the distant embedded laboratory are fully described. 2009 Wiley Periodicals, Inc. Comput Appl Eng Educ 17: 389,397, 2009; Published online in Wiley InterScience (www.interscience.wiley.com); DOI 10.1002/cae.20259 [source]


    A software player for providing hints in problem-based learning according to a new specification

    COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, Issue 3 2009
    Pedro J. Muoz-Merino
    Abstract The provision of hints during problem solving has been a successful strategy in the learning process. There exist several computer systems that provide hints to students during problem solving, covering some specific issues of hinting. This article presents a novel software player module for providing hints in problem-based learning. We have implemented it into the XTutor Intelligent Tutoring System using its XDOC extension mechanism and the Python programming language. This player includes some of the functionalities that are present in different state-of-the-art systems, and also other new relevant functionalities based on our own ideas and teaching experience. The article explains each feature for providing hints and it also gives a pedagogical justification or explanation. We have created an XML binding, so any combination of the model hints functionalities can be expressed as an XML instance, enabling interoperability and reusability. The implemented player tool together with the XTutor server-side XDOC processor can interpret and run XML files according to this newly defined hints specification. Finally, the article presents several running examples of use of the tool, the subjects where it is in use, and results that lead to the conclusion of the positive impact of this hints tool in the learning process based on quantitative and qualitative analysis. 2009 Wiley Periodicals, Inc. Comput Appl Eng Educ 17: 272,284, 2009; Published online in Wiley InterScience (www.interscience.wiley.com); DOI 10.1002/cae.20240 [source]


    SIMDE: An educational simulator of ILP architectures with dynamic and static scheduling

    COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, Issue 3 2007
    I. Castilla
    Abstract This article presents SIMDE, a cycle-by-cycle simulator to support teaching of Instruction-Level Parallelism (ILP) architectures. The simulator covers dynamic and static instruction scheduling by using a shared structure for both approaches. Dynamic scheduling is illustrated by means of a simple superscalar processor based on Tomasulo's algorithm. A basic Very Long Instruction Word (VLIW) processor has been designed for static scheduling. The simulator is intended as an aid-tool for teaching theoretical contents in Computer Architecture and Organization courses. The students are provided with an easy-to-use common environment to perform different simulations and comparisons between superscalar and VLIW processors. Furthermore, the simulator has been tested by students in a Computer Architecture course in order to assess its real usefulness. 2007 Wiley Periodicals, Inc. Comput Appl Eng Educ 14: 226,239, 2007; Published online in Wiley InterScience (www.interscience.wiley.com); DOI 10.1002/cae.20154 [source]


    Fragment-Parallel Composite and Filter

    COMPUTER GRAPHICS FORUM, Issue 4 2010
    Anjul Patney
    We present a strategy for parallelizing the composite and filter operations suitable for an order-independent rendering pipeline implemented on a modern graphics processor. Conventionally, this task is parallelized across pixels/subpixels, but serialized along individual depth layers. However, our technique extends the domain of parallelization to individual fragments (samples), avoiding a serial dependence on the number of depth layers, which can be a constraint for scenes with high depth complexity. As a result, our technique scales with the number of fragments and can sustain a consistent and predictable throughput in scenes with both low and high depth complexity, including those with a high variability of depth complexity within a single frame. We demonstrate composite/filter performance in excess of 50M fragments/sec for scenes with more than 1500 semi-transparent layers. [source]


    Practical CFD Simulations on Programmable Graphics Hardware using SMAC,

    COMPUTER GRAPHICS FORUM, Issue 4 2005
    Carlos E. Scheidegger
    Abstract The explosive growth in integration technology and the parallel nature of rasterization-based graphics APIs (Application Programming Interface) changed the panorama of consumer-level graphics: today, GPUs (Graphics Processing Units) are cheap, fast and ubiquitous. We show how to harness the computational power of GPUs and solve the incompressible Navier-Stokes fluid equations significantly faster (more than one order of magnitude in average) than on CPU solvers of comparable cost. While past approaches typically used Stam's implicit solver, we use a variation of SMAC (Simplified Marker and Cell). SMAC is widely used in engineering applications, where experimental reproducibility is essential. Thus, we show that the GPU is a viable and affordable processor for scientific applications. Our solver works with general rectangular domains (possibly with obstacles), implements a variety of boundary conditions and incorporates energy transport through the traditional Boussinesq approximation. Finally, we discuss the implications of our solver in light of future GPU features, and possible extensions such as three-dimensional domains and free-boundary problems. [source]


    Implementation, performance, and science results from a 30.7 TFLOPS IBM BladeCenter cluster

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 2 2010
    Craig A. Stewart
    Abstract This paper describes Indiana University's implementation, performance testing, and use of a large high performance computing system. IU's Big Red, a 20.48 TFLOPS IBM e1350 BladeCenter cluster, appeared in the 27th Top500 list as the 23rd fastest supercomputer in the world in June 2006. In spring 2007, this computer was upgraded to 30.72 TFLOPS. The e1350 BladeCenter architecture, including two internal networks accessible to users and user applications and two networks used exclusively for system management, has enabled the system to provide good scalability on many important applications while being well manageable. Implementing a system based on the JS21 Blade and PowerPC 970MP processor within the US TeraGrid presented certain challenges, given that Intel-compatible processors dominate the TeraGrid. However, the particular characteristics of the PowerPC have enabled it to be highly popular among certain application communities, particularly users of molecular dynamics and weather forecasting codes. A critical aspect of Big Red's implementation has been a focus on Science Gateways, which provide graphical interfaces to systems supporting end-to-end scientific workflows. Several Science Gateways have been implemented that access Big Red as a computational resource,some via the TeraGrid, some not affiliated with the TeraGrid. In summary, Big Red has been successfully integrated with the TeraGrid, and is used by many researchers locally at IU via grids and Science Gateways. It has been a success in terms of enabling scientific discoveries at IU and, via the TeraGrid, across the US. Copyright 2009 John Wiley & Sons, Ltd. [source]


    Clock synchronization in Cell/B.E. traces

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 14 2009
    M. Biberstein
    Abstract Cell/B.E. is a heterogeneous multicore processor that was designed for the efficient execution of parallel and vectorizable applications with high computation and memory requirements. The transition to multicores introduces the challenge of providing tools that help programmers tune the code running on these architectures. Tracing tools, in particular, often help locate performance problems related to thread and process communication. A major impediment to implementing tracing on Cell is the absence of a common clock that can be accessed at low cost from all cores. The OS clock is costly to access from the auxiliary cores and the hardware timers cannot be simultaneously set on all the cores. In this paper, we describe an offline trace analysis algorithm that assigns wall-clock time to trace records based on their thread-local time stamps and event order. Our experiments on several Cell SDK workloads show that the indeterminism in assigning wall-clock time to events is low, on average 20,40 clock ticks (translating into 1.4,2.8,s on the system used in our experiments). We also show how various practical problems, such as the imprecision of time measurement, can be overcome. Copyright 2009 John Wiley & Sons, Ltd. [source]


    An efficient concurrent implementation of a neural network algorithm

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 12 2006
    R. Andonie
    Abstract The focus of this study is how we can efficiently implement the neural network backpropagation algorithm on a network of computers (NOC) for concurrent execution. We assume a distributed system with heterogeneous computers and that the neural network is replicated on each computer. We propose an architecture model with efficient pattern allocation that takes into account the speed of processors and overlaps the communication with computation. The training pattern set is distributed among the heterogeneous processors with the mapping being fixed during the learning process. We provide a heuristic pattern allocation algorithm minimizing the execution time of backpropagation learning. The computations are overlapped with communications. Under the condition that each processor has to perform a task directly proportional to its speed, this allocation algorithm has polynomial-time complexity. We have implemented our model on a dedicated network of heterogeneous computers using Sejnowski's NetTalk benchmark for testing. Copyright 2005 John Wiley & Sons, Ltd. [source]


    Performance evaluation of the SX-6 vector architecture for scientific computations

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 1 2005
    Leonid Oliker
    Abstract The growing gap between sustained and peak performance for scientific applications is a well-known problem in high-performance computing. The recent development of parallel vector systems offers the potential to reduce this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor, and compares it against the cache-based IBM Power3 and Power4 superscalar architectures, across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines many low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Overall results demonstrate that the SX-6 achieves high performance on a large fraction of our application suite and often significantly outperforms the cache-based architectures. However, certain classes of applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively. Copyright 2005 John Wiley & Sons, Ltd. [source]


    Supporting Bulk Synchronous Parallelism with a high-bandwidth optical interconnect

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 13 2004
    I. Gourlay
    Abstract The list of applications requiring high-performance computing resources is constantly growing. The cost of inter-processor communication is critical in determining the performance of massively parallel computing systems for many of these applications. This paper considers the feasibility of a commodity processor-based system which uses a free-space optical interconnect. A novel architecture, based on this technology, is presented. Analytical and simulation results based on an implementation of BSP (Bulk Synchronous Parallelism) are presented, indicating that a significant performance enhancement, over architectures using conventional interconnect technology, is possible. Copyright 2004 John Wiley & Sons, Ltd. [source]


    A comparison of concurrent programming and cooperative multithreading under load balancing applications

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 4 2004
    Justin T. Maris
    Abstract Two models of thread execution are the general concurrent programming execution model (CP) and the cooperative multithreading execution model (CM). CP provides nondeterministic thread execution where context switches occur arbitrarily. CM provides threads that execute one at a time until they explicitly choose to yield the processor. This paper focuses on a classic application to reveal the advantages and disadvantages of load balancing during thread execution under CP and CM styles; results from a second classic application were similar. These applications are programmed in two different languages (SR and Dynamic C) on different hardware (standard PCs and embedded system controllers). An SR-like run-time system, DesCaRTeS, was developed to provide interprocess communication for the Dynamic C implementations. This paper compares load balancing and non-load balancing implementations; it also compares CP and CM style implementations. The results show that in cases of very high or very low workloads, load balancing slightly hindered performance; and in cases of moderate workload, both SR and Dynamic C implementations of load balancing generally performed well. Further, for these applications, CM style programs outperform CP style programs in some cases, but the opposite occurs in some other cases. This paper also discusses qualitative tradeoffs between CM style programming and CP style programming for these applications. Copyright 2004 John Wiley & Sons, Ltd. [source]


    Lesser Bear: A lightweight process library for SMP computers,scheduling mechanism without a lock operation

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 10 2002
    Hisashi Oguma
    Abstract We have designed and implemented a lightweight process (thread) library called ,Lesser Bear' for SMP computers. Lesser Bear has thread-level parallelism and high portability. Lesser Bear executes threads in parallel by creating UNIX processes as virtual processors and a memory-mapped file as a huge shared-memory space. To schedule thread in parallel, the shared-memory space has been divided into working spaces for each virtual processor, and a ready queue has been distributed. However the previous version of Lesser Bear sometimes requires a lock operation for dequeueing. We therefore proposed a scheduling mechanism that does not require a lock operation. To achieve this, each divided space forms a link topology through the queues, and we use a lock-free algorithm for the queue operation. This mechanism is applied to Lesser Bear and evaluated by experimental results. Copyright 2002 John Wiley & Sons, Ltd. [source]


    Unified Medical Language System Coverage of Emergency-medicine Chief Complaints

    ACADEMIC EMERGENCY MEDICINE, Issue 12 2006
    Debbie A. Travers PhD
    Abstract Background Emergency department (ED) chief-complaint (CC) data increasingly are important for clinical-care and secondary uses such as syndromic surveillance. There is no widely used ED CC vocabulary, but experts have suggested evaluation of existing health-care vocabularies for ED CC. Objectives To evaluate the ED CC coverage in existing biomedical vocabularies from the Unified Medical Language System (UMLS). Methods The study sample included all CC entries for all visits to three EDs over one year. The authors used a special-purpose text processor to clean CC entries, which then were mapped to UMLS concepts. The UMLS match rates then were calculated and analyzed for matching concepts and nonmatching entries. Results A total of 203,509 ED visits was included. After cleaning with the text processor, 82% of the CCs matched a UMLS concept. The authors identified 5,617 unique UMLS concepts in the ED CC data, but many were used for only one or two visits. One thousand one hundred thirty-six CC concepts were used more than ten times and covered 99% of all the ED visits. The largest biomedical vocabulary in the UMLS is the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), which included concepts for 79% of all ED CC entries. However, some common CCs were not found in SNOMED CT. Conclusions The authors found that ED CC concepts are well covered by the UMLS and that the best source of vocabulary coverage is from SNOMED CT. There are some gaps in UMLS and SNOMED CT coverage of ED CCs. Future work on vocabulary control for ED CCs should build upon existing vocabularies. [source]


    Economic Production Lot Sizing with Periodic Costs and Overtime,

    DECISION SCIENCES, Issue 3 2001
    E. Powell Robinson Jr
    Abstract Traditional approaches for modeling economic production lot-sizing problems assume that a single, fixed equipment setup cost is incurred each time a product is run, regardless of the quantity manufactured. This permits multiple days of production from one production setup. In this paper, we extend the model to consider additional fixed charges, such as cleanup or inspection costs, that are associated with each time period's production. This manufacturing cost structure is common in the food, chemical, and pharmaceutical industries, where process equipment must be sanitized between item changeovers and at the end of each day's production. We propose two mathematical problem formulations and optimization algorithms. The models' unique features include regular time production constraints, a fixed charge for each time period's production, and the availability of overtime production capacity. Experimental results indicate the conditions under which our algorithms' performance is superior to traditional approaches. We also test the procedures on a set of lot-sizing problems facing a national food processor and document their potential economic benefit. [source]


    The Phenomenology of Space in Writing Online

    EDUCATIONAL PHILOSOPHY AND THEORY, Issue 1 2009
    Max Van Manen
    Abstract In this paper we explore the phenomenon of writing online. We ask, ,Is writing by means of online technologies affected in a manner that differs significantly from the older technologies of pen on paper, typewriter, or even the word processor in an off-line environment?' In writing online, the author is engaged in a spatial complexity of physical, temporal, imaginal, and virtual experience: the writing space, the space of the text, cyber space, etc. At times, these may provide a conduit to a writerly understanding of human phenomena. We propose that an examination of the phenomenological features of online writing may contribute to a more pedagogically sensitive understanding of the experiences of online seminars, teaching and learning. [source]


    Design and implementation of a new neural network-based high speed distance relay

    EUROPEAN TRANSACTIONS ON ELECTRICAL POWER, Issue 4 2008
    M. Sanaye-Pasand
    Abstract This paper presents a new neural network-based transmission line distance protection module. The proposed module uses samples of voltage and current signals to learn the hidden relationship existing in the input patterns. Using a power system model, simulation studies are preformed and influence of changing system parameters such as fault resistance and power flow direction is studied. The proposed neural network has also been implemented on a digital signal processor (DSP) board and its behavior is investigated using suitable developed hardware. Details of the implementation and experimental studies are given in the paper. Performance studies results show that the proposed algorithm is able to distinguish various transmission line faults rapidly and correctly. It shows that the proposed network is fast, reliable, and accurate. Copyright 2007 John Wiley & Sons, Ltd. [source]


    Protocols and techniques for a scalable atom,photon quantum network

    FORTSCHRITTE DER PHYSIK/PROGRESS OF PHYSICS, Issue 11-12 2009
    L. Luo
    Abstract Quantum networks based on atomic qubits and scattered photons provide a promising way to build a large-scale quantum information processor. We review quantum protocols for generating entanglement and operating gates between two distant atomic qubits, which can be used for constructing scalable atom,photon quantum networks. We emphasize the crucial role of collecting light from atomic qubits for large-scale networking and describe two techniques to enhance light collection using reflective optics or optical cavities. A brief survey of some applications for scalable and efficient atom,photon networks is also provided. [source]


    Probabilistic programmable quantum processors

    FORTSCHRITTE DER PHYSIK/PROGRESS OF PHYSICS, Issue 11-12 2004
    V. Bu
    We analyze how to improve performance of probabilistic programmable quantum processors. We show how the probability of success of the probabilistic processor can be enhanced by using the processor in loops. In addition, we show that an arbitrary SU(2) transformations of qubits can be encoded in program state of a universal programmable probabilistic quantum processor. The probability of success of this processor can be enhanced by a systematic correction of errors via conditional loops. Finally, we show that all our results can be generalized also for qudits. [source]


    Fuel Cell Vehicle Simulation , Part 1: Benchmarking Available Fuel Cell Vehicle Simulation Tools

    FUEL CELLS, Issue 3 2003
    K.H. Hauer
    Abstract Fuel cell vehicle simulation is one method for systematic and fast investigation of the different vehicle options (fuel choice, hybridization, reformer technologies). However, a sufficient modeling program, capable of modeling the different design options, is not available today. Modern simulation programs should be capable of serving as tools for analysis as well as development. Shortfalls of the existing programs, initially developed for internal combustion engine hybrid vehicles, are: (i)Insufficient modeling of transient characteristics; (ii) Insufficient modeling of the fuel cells system; (iii) Insufficient modeling of advanced hybrid systems; (iv) Employment of a non-causal (backwards looking) structure; (v) Significant shortcomings in the area of controls. In the area of analysis, a modeling tool for fuel cell vehicles needs to address the transient dynamic interaction between the electric drive train and the fuel cell system. Especially for vehicles with slow responding on-board fuel processor, this interaction is very different from the interaction between a battery (as power source) and an electric drive train in an electric vehicle design. Non-transient modeling leads to inaccurate predictions of vehicle performance and fuel consumption. When applied in the area of development, the existing programs do not support the employment of newer techniques, such as rapid prototyping. This is because the program structure merges control algorithms and component models, or different control algorithms (from different components) are lumped together in one single control block and not assigned to individual components as they are in real vehicles. In both cases, the transfer of control algorithms from the model into existing hardware is not possible. This paper is the first part of a three part series and benchmarks the "state of the art" of existing programs. The second paper introduces a new simulation program, which tries to overcome existing barriers. Specifically it explicitly recognizes the dynamic interaction between fuel cell system, drive train and optional additional energy storage. [source]


    Micro-mechanical simulation of geotechnical problems using massively parallel computers

    INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 14 2003
    David W. Washington
    Abstract This paper demonstrates that the architecture of a massively parallel computer can be adapted for micro-mechanical simulations of a Geotechnical problem. The Discrete Element Method was used on a massively parallel supercomputer to simulate Geotechnical boundary value problems. For the demonstration, a triaxial test was simulated using an algorithm titled ,TRUBAL for Parallel Machines (TPM)' based on the discrete element method (DEM). In this trial demonstration, the inherent parallelism within DEM algorithm is shown. Then a comparison is made between the parallel algorithm (TPM) and the serial algorithm (TRUBAL) to show the benefits of this research. TPM showed substantial improvement in performance with increasing number of processors when compared with TRUBAL using single processor. Copyright 2003 John Wiley & Sons, Ltd. [source]


    Evaluating recursive filters on distributed memory parallel computers

    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 11 2006
    Przemys, aw Stpiczy, skiArticle first published online: 6 APR 200
    Abstract The aim of this paper is to show that the recently developed high performance divide and conquer algorithm for solving linear recurrence systems with constant coefficients together with the new BLAS-based algorithm for narrow-banded triangular Toeplitz matrix,vector multiplication, allow to evaluate linear recursive filters efficiently on distributed memory parallel computers. We apply the BSP model of parallel computing to predict the behaviour of the algorithm and to find the optimal values of the method's parameters. The results of experiments performed on a cluster of twelve dual-processor Itanium 2 computers and Cray X1 are also presented and discussed. The algorithm allows to utilize up to 30% of the peak performance of 24 Itanium processors, while a simple scalar algorithm can only utilize about 4% of the peak performance of a single processor. Copyright 2006 John Wiley & Sons, Ltd. [source]


    Parallel load-balanced simulation for short-range interaction particle methods with hierarchical particle grouping based on orthogonal recursive bisection

    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 4 2008
    Florian Fleissner
    Abstract We describe an efficient load-balancing algorithm for parallel simulations of particle-based discretization methods such as the discrete element method or smoothed particle hydrodynamics. Our approach is based on an orthogonal recursive bisection of the simulation domain that is the basis for recursive particle grouping and assignment of particle groups to the parallel processors. Particle grouping is carried out based on sampled discrete particle distribution functions. For interaction detection and computation, which is the core part of particle simulations, we employ a hierarchical pruning algorithm for an efficient exclusion of non-interacting particles via the detection of non-overlapping bounding boxes. Load balancing is based on a hierarchical PI-controller approach, where the differences of processor per time step waiting times serve as controller input. Copyright 2007 John Wiley & Sons, Ltd. [source]


    Parallel eigenanalysis of multiaquifer systems

    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 15 2005
    L. Bergamaschi
    Abstract Finite element discretizations of flow problems involving multiaquifer systems deliver large, sparse, unstructured matrices, whose partial eigenanalysis is important for both solving the flow problem and analysing its main characteristics. We studied and implemented an effective preconditioning of the Jacobi,Davidson algorithm by FSAI-type preconditioners. We developed efficient parallelization strategies in order to solve very large problems, which could not fit into the storage available to a single processor. We report our results about the solution of multiaquifer flow problems on an SP4 machine and a Linux Cluster. We analyse the sequential and parallel efficiency of our algorithm, also compared with standard packages. Questions regarding the parallel solution of finite element eigenproblems are addressed and discussed. Copyright 2005 John Wiley & Sons, Ltd. [source]


    A class of parallel multiple-front algorithms on subdomains

    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 11 2003
    A. Bose
    Abstract A class of parallel multiple-front solution algorithms is developed for solving linear systems arising from discretization of boundary value problems and evolution problems. The basic substructuring approach and frontal algorithm on each subdomain are first modified to ensure stable factorization in situations where ill-conditioning may occur due to differing material properties or the use of high degree finite elements (p methods). Next, the method is implemented on distributed-memory multiprocessor systems with the final reduced (small) Schur complement problem solved on a single processor. A novel algorithm that implements a recursive partitioning approach on the subdomain interfaces is then developed. Both algorithms are implemented and compared in a least-squares finite-element scheme for viscous incompressible flow computation using h - and p -finite element schemes. Copyright 2003 John Wiley & Sons, Ltd. [source]


    Coupled solution of the species conservation equations using unstructured finite-volume method

    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 4 2010
    Ankan Kumar
    Abstract A coupled solver was developed to solve the species conservation equations on an unstructured mesh with implicit spatial as well as species-to-species coupling. First, the computational domain was decomposed into sub-domains comprised of geometrically contiguous cells,a process similar to additive Schwarz decomposition. This was done using the binary spatial partitioning algorithm. Following this step, for each sub-domain, the discretized equations were developed using the finite-volume method, and solved using an iterative solver based on Krylov sub-space iterations, that is, the pre-conditioned generalized minimum residual solver. Overall (outer) iterations were then performed to treat explicitness at sub-domain interfaces and nonlinearities in the governing equations. The solver is demonstrated for both two-dimensional and three-dimensional geometries for laminar methane,air flame calculations with 6 species and 2 reaction steps, and for catalytic methane,air combustion with 19 species and 24 reaction steps. It was found that the best performance is manifested for sub-domain size of 2000 cells or more, the exact number depending on the problem at hand. The overall gain in computational efficiency was found to be a factor of 2,5 over the block (coupled) Gauss,Seidel procedure. All calculations were performed on a single processor machine. The largest calculations were performed for about 355 000 cells (4.6 million unknowns) and required 900,MB of peak runtime memory and 19,h of CPU on a single processor. Copyright 2009 John Wiley & Sons, Ltd. [source]


    Finite volume multigrid method of the planar contraction flow of a viscoelastic fluid

    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 8 2001
    H. Al Moatssime
    Abstract This paper reports on a numerical algorithm for the steady flow of viscoelastic fluid. The conservative and constitutive equations are solved using the finite volume method (FVM) with a hybrid scheme for the velocities and first-order upwind approximation for the viscoelastic stress. A non-uniform staggered grid system is used. The iterative SIMPLE algorithm is employed to relax the coupled momentum and continuity equations. The non-linear algebraic equations over the flow domain are solved iteratively by the symmetrical coupled Gauss,Seidel (SCGS) method. In both, the full approximation storage (FAS) multigrid algorithm is used. An Oldroyd-B fluid model was selected for the calculation. Results are reported for planar 4:1 abrupt contraction at various Weissenberg numbers. The solutions are found to be stable and smooth. The solutions show that at high Weissenberg number the domain must be long enough. The convergence of the method has been verified with grid refinement. All the calculations have been performed on a PC equipped with a Pentium III processor at 550 MHz. Copyright 2001 John Wiley & Sons, Ltd. [source]


    Real-time signal processing for high-density microelectrode array systems

    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 11 2009
    K. Imfeld
    Abstract The microelectrode array (MEA) technology is continuously progressing towards higher integration of an increasing number of electrodes. The ensuing data streams that can be of several hundreds or thousands of Megabits/s require the implementation of new signal processing and data handling methodologies to substitute the currently used off-line analysis methods. Here, we present one approach based on the hardware implementation of a wavelet-based solution for real-time processing of extracellular neuronal signals acquired on high-density MEAs. We demonstrate that simple mathematical operations on the discrete wavelet transform (DWT) coefficients can be used for efficient neuronal spike detection and sorting. As the DWT is particularly well suited for implementation on dedicated hardware, we elaborated a wavelet processor on a field programmable gate array (FPGA) in order to compute the wavelet coefficients on 256 channels in real-time. By providing sufficient hardware resources, this solution can be easily scaled up for processing more electrode channels. Copyright 2008 John Wiley & Sons, Ltd. [source]


    Design of a near-optimal adaptive filter in digital signal processor for active noise control

    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 1 2008
    S. M. Yang
    Abstract Adaptive filter has been applied in adaptive feedback and feedforward control systems, where the filter dimension is often determined by trial-and-error. The controller design based on a near-optimal adaptive filter in digital signal processor (DSP) is developed in this paper for real-time applications. The design integrates the adaptive filter and the experimental design such that their advantages in stability and robustness can be combined. The near-optimal set of controller parameters, including the sampling rate, the dimension of system identification model, the dimension (order) of adaptive controller in the form of an FIR filter, and the convergence rate of adaptation is shown to achieve the best possible system performance. In addition, the sensitivity of each design parameter can be determined by analysis of means and analysis of variance. Effectiveness of the adaptive controller on a DSP is validated by an active noise control experiment. Copyright 2007 John Wiley & Sons, Ltd. [source]