Load Balancing (load + balancing)

Distribution by Scientific Domains

Kinds of Load Balancing

  • dynamic load balancing


  • Selected Abstracts


    Out-of-Core and Dynamic Programming for Data Distribution on a Volume Visualization Cluster

    COMPUTER GRAPHICS FORUM, Issue 1 2009
    S. Frank
    I.3.2 [Computer Graphics]: Distributed/network graphics; C.2.4 [Distributed Systems]: Distributed applications Abstract Ray directed volume-rendering algorithms are well suited for parallel implementation in a distributed cluster environment. For distributed ray casting, the scene must be partitioned between nodes for good load balancing, and a strict view-dependent priority order is required for image composition. In this paper, we define the load balanced network distribution (LBND) problem and map it to the NP-complete precedence constrained job-shop scheduling problem. We introduce a kd-tree solution and a dynamic programming solution. To process a massive data set, either a parallel or an out-of-core approach is required. Parallel preprocessing is performed by render nodes on data, which are allocated using a static data structure. Volumetric data sets often contain a large portion of voxels that will never be rendered, or empty space. Parallel preprocessing fails to take advantage of this. Our slab-projection slice, introduced in this paper, tracks empty space across consecutive slices of data to reduce the amount of data distributed and rendered. It is used to facilitate out-of-core bricking and kd-tree partitioning. Load balancing using each of our approaches is compared with traditional methods using several segmented regions of the Visible Korean data set. [source]


    Data partitioning-based parallel irregular reductions

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 2-3 2004
    Eladio Gutiérrez
    Abstract Different parallelization methods for irregular reductions on shared memory multiprocessors have been proposed in the literature in recent years. We have classified all these methods and analyzed them in terms of a set of properties: data locality, memory overhead, exploited parallelism, and workload balancing. In this paper we propose several techniques to increase the amount of exploited parallelism and to introduce load balancing into an important class of these methods. Regarding parallelism, the proposed solution is based on the partial expansion of the reduction array. Load balancing is discussed in terms of two techniques. The first technique is a generic one, as it deals with any kind of load imbalance present in the problem domain. The second technique handles a special case of load imbalance which occurs whenever a large number of write operations are concentrated on small regions of the reduction arrays. Efficient implementations of the proposed optimizing solutions for a particular method are presented, experimentally tested on static and dynamic kernel codes, and compared with other parallel reduction methods. Copyright © 2004 John Wiley & Sons, Ltd. [source]


    Parallel load-balanced simulation for short-range interaction particle methods with hierarchical particle grouping based on orthogonal recursive bisection

    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 4 2008
    Florian Fleissner
    Abstract We describe an efficient load-balancing algorithm for parallel simulations of particle-based discretization methods such as the discrete element method or smoothed particle hydrodynamics. Our approach is based on an orthogonal recursive bisection of the simulation domain that is the basis for recursive particle grouping and assignment of particle groups to the parallel processors. Particle grouping is carried out based on sampled discrete particle distribution functions. For interaction detection and computation, which is the core part of particle simulations, we employ a hierarchical pruning algorithm for an efficient exclusion of non-interacting particles via the detection of non-overlapping bounding boxes. Load balancing is based on a hierarchical PI-controller approach, where the differences of processor per time step waiting times serve as controller input. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Out-of-Core and Dynamic Programming for Data Distribution on a Volume Visualization Cluster

    COMPUTER GRAPHICS FORUM, Issue 1 2009
    S. Frank
    I.3.2 [Computer Graphics]: Distributed/network graphics; C.2.4 [Distributed Systems]: Distributed applications Abstract Ray directed volume-rendering algorithms are well suited for parallel implementation in a distributed cluster environment. For distributed ray casting, the scene must be partitioned between nodes for good load balancing, and a strict view-dependent priority order is required for image composition. In this paper, we define the load balanced network distribution (LBND) problem and map it to the NP-complete precedence constrained job-shop scheduling problem. We introduce a kd-tree solution and a dynamic programming solution. To process a massive data set, either a parallel or an out-of-core approach is required. Parallel preprocessing is performed by render nodes on data, which are allocated using a static data structure. Volumetric data sets often contain a large portion of voxels that will never be rendered, or empty space. Parallel preprocessing fails to take advantage of this. Our slab-projection slice, introduced in this paper, tracks empty space across consecutive slices of data to reduce the amount of data distributed and rendered. It is used to facilitate out-of-core bricking and kd-tree partitioning. Load balancing using each of our approaches is compared with traditional methods using several segmented regions of the Visible Korean data set. [source]


    A formalized approach for designing a P2P-based dynamic load balancing scheme

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 10 2010
    Hengheng Xie
    Abstract Quality of service (QoS) is attracting more and more attention in many areas, including entertainment, emergency services, transaction services, and so on. Therefore, the study of QoS-aware systems is becoming an important research topic in the area of distributed systems. In terms of load balancing, most of the existing QoS-related load balancing algorithms focus on Routing Mechanism and Traffic Engineering. However, research on QoS-aware task scheduling and service migration is very limited. In this paper, we propose a task scheduling algorithm using dynamic QoS properties, and we develop a Genetic Algorithm-based Services Migration scheme aiming to optimize the performance of our proposed QoS-aware distributed service-based system. In order to verify the efficiency of our scheme, we implement a prototype of our algorithm using a P2P-based JXTA technique, and do an emulation test and a simulation test in order to analyze our proposed solution. We compare our service-migration-based algorithm with non-migration and non-load-balancing approaches, and find that our solution is much better than the other two in terms of QoS success rate. Furthermore, in order to provide more solid proofs of our research, we use DEVS to validate our system design. Copyright © 2010 John Wiley & Sons, Ltd. [source]


    Factors affecting the performance of parallel mining of minimal unique itemsets on diverse architectures

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 9 2009
    D. J. Haglin
    Abstract Three parallel implementations of a divide-and-conquer search algorithm (called SUDA2) for finding minimal unique itemsets (MUIs) are compared in this paper. The identification of MUIs is used by national statistics agencies for statistical disclosure assessment. The first parallel implementation adapts SUDA2 to a symmetric multi-processor cluster using the message passing interface (MPI), which we call an MPI cluster; the second optimizes the code for the Cray MTA2 (a shared-memory, multi-threaded architecture) and the third uses a heterogeneous ,group' of workstations connected by LAN. Each implementation considers the parallel structure of SUDA2, and how the subsearch computation times and sequence of subsearches affect load balancing. All three approaches scale with the number of processors, enabling SUDA2 to handle larger problems than before. For example, the MPI implementation is able to achieve nearly two orders of magnitude improvement with 132 processors. Performance results are given for a number of data sets. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Parallel space-filling curve generation through sorting

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 10 2007
    J. Luitjens
    Abstract In this paper we consider the scalability of parallel space-filling curve generation as implemented through parallel sorting algorithms. Multiple sorting algorithms are studied and results show that space-filling curves can be generated quickly in parallel on thousands of processors. In addition, performance models are presented that are consistent with measured performance and offer insight into performance on still larger numbers of processors. At large numbers of processors, the scalability of adaptive mesh refined codes depends on the individual components of the adaptive solver. One such component is the dynamic load balancer. In adaptive mesh refined codes, the mesh is constantly changing resulting in load imbalance among the processors requiring a load-balancing phase. The load balancing may occur often, requiring the load balancer to perform quickly. One common method for dynamic load balancing is to use space-filling curves. Space-filling curves, in particular the Hilbert curve, generate good partitions quickly in serial. However, at tens and hundreds of thousands of processors serial generation of space-filling curves will hinder scalability. In order to avoid this issue we have developed a method that generates space-filling curves quickly in parallel by reducing the generation to integer sorting. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 1 2007
    Panagiotis Katsaros
    Abstract Checkpointing has a crucial impact on systems' performance and fault-tolerance effectiveness: excessive checkpointing results in performance degradation, while deficient checkpointing incurs expensive recovery. In distributed systems with independent checkpoint activities there is no easy way to determine checkpoint frequencies optimizing response-time and fault-tolerance costs at the same time. The purpose of this paper is to investigate the potentialities of a statistical decision-making procedure. We adopt a simulation-based approach for obtaining performance metrics that are afterwards used for determining a trade-off between checkpoint interval reductions and efficiency in performance. Statistical methodology including experimental design, regression analysis and optimization provides us with the framework for comparing configurations, which use possibly different fault-tolerance mechanisms (replication-based or message-logging-based). Systematic research also allows us to take into account additional design factors, such as load balancing. The method is described in terms of a standardized object replication model (OMG FT-CORBA), but it could also be applied in other (e.g. process-based) computational models. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    Distributed loop-scheduling schemes for heterogeneous computer systems

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 7 2006
    Anthony T. Chronopoulos
    Abstract Distributed computing systems are a viable and less expensive alternative to parallel computers. However, a serious difficulty in concurrent programming of a distributed system is how to deal with scheduling and load balancing of such a system which may consist of heterogeneous computers. Some distributed scheduling schemes suitable for parallel loops with independent iterations on heterogeneous computer clusters have been designed in the past. In this work we study self-scheduling schemes for parallel loops with independent iterations which have been applied to multiprocessor systems in the past. We extend one important scheme of this type to a distributed version suitable for heterogeneous distributed systems. We implement our new scheme on a network of computers and make performance comparisons with other existing schemes. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    A comparison of concurrent programming and cooperative multithreading under load balancing applications

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 4 2004
    Justin T. Maris
    Abstract Two models of thread execution are the general concurrent programming execution model (CP) and the cooperative multithreading execution model (CM). CP provides nondeterministic thread execution where context switches occur arbitrarily. CM provides threads that execute one at a time until they explicitly choose to yield the processor. This paper focuses on a classic application to reveal the advantages and disadvantages of load balancing during thread execution under CP and CM styles; results from a second classic application were similar. These applications are programmed in two different languages (SR and Dynamic C) on different hardware (standard PCs and embedded system controllers). An SR-like run-time system, DesCaRTeS, was developed to provide interprocess communication for the Dynamic C implementations. This paper compares load balancing and non-load balancing implementations; it also compares CP and CM style implementations. The results show that in cases of very high or very low workloads, load balancing slightly hindered performance; and in cases of moderate workload, both SR and Dynamic C implementations of load balancing generally performed well. Further, for these applications, CM style programs outperform CP style programs in some cases, but the opposite occurs in some other cases. This paper also discusses qualitative tradeoffs between CM style programming and CP style programming for these applications. Copyright © 2004 John Wiley & Sons, Ltd. [source]


    Data partitioning-based parallel irregular reductions

    CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 2-3 2004
    Eladio Gutiérrez
    Abstract Different parallelization methods for irregular reductions on shared memory multiprocessors have been proposed in the literature in recent years. We have classified all these methods and analyzed them in terms of a set of properties: data locality, memory overhead, exploited parallelism, and workload balancing. In this paper we propose several techniques to increase the amount of exploited parallelism and to introduce load balancing into an important class of these methods. Regarding parallelism, the proposed solution is based on the partial expansion of the reduction array. Load balancing is discussed in terms of two techniques. The first technique is a generic one, as it deals with any kind of load imbalance present in the problem domain. The second technique handles a special case of load imbalance which occurs whenever a large number of write operations are concentrated on small regions of the reduction arrays. Efficient implementations of the proposed optimizing solutions for a particular method are presented, experimentally tested on static and dynamic kernel codes, and compared with other parallel reduction methods. Copyright © 2004 John Wiley & Sons, Ltd. [source]


    Considering safety issues in minimum losses reconfiguration for MV distribution networks

    EUROPEAN TRANSACTIONS ON ELECTRICAL POWER, Issue 5 2009
    Angelo Campoccia
    Abstract This paper offers a new perspective over the traditional problem of the multiobjective optimal reconfiguration of electrical distribution systems in regular working state. The issue is indeed here formulated including also safety issues. Indeed, dimensioning the earth electrodes of their own secondary substations, distribution companies take into account the probable future configurations of the network due to transformations of overhead lines into cable lines or realization of new lines. On the contrary, they do not consider that, during normal working conditions, the structure of the network can be modified for long periods as a consequence of reconfiguration manoeuvres, with differences between the design current of the earthing systems and the fault current in certain substations significant. As a consequence, often distribution companies limit the implementation of the optimal reconfiguration layouts because they are unable to suitably evaluate the safety issue. In the paper, the problem is formulated including a further objective in order to account for the safety. A suitable constrained multiobjective formulation of the reconfiguration problem is therefore used aiming at: the minimal power losses operation, the verification of safety at distribution substations, the load balancing among the HV/MV transformers while keeping the voltage profile regular. The application carried out uses an NSGA-II algorithm whose performance is compared to that of a fuzzy logic-based multiobjective evolutionary algorithm. In the considered automated network, the remote control of tie-switches is possible and their layout is the optimization variable. After a brief description of the optimal reconfiguration problem for automated distribution networks, the most recent papers on the topic are reported and commented. Then the problem formulation and the solution algorithm are described in detail. Finally, test results on a large MV distribution network are reported and discussed. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    Inter-cell coordination in wireless data networks

    EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, Issue 3 2006
    Thomas Bonald
    Over the past few years, the design and performance of channel-aware scheduling strategies have attracted huge interest. In the present paper, we examine a somewhat different notion of scheduling, namely coordination of transmissions among base stations, which has received little attention so far. The inter-cell coordination comprises two key elements: (i) interference avoidance and (ii) load balancing. The interference avoidance involves coordinating the activity phases of interfering base stations so as to increase transmission rates. The load balancing aims at diverting traffic from heavily loaded cells to lightly loaded cells. Numerical experiments demonstrate that inter-cell scheduling may provide significant capacity gains. Copyright © 2006 AEIT [source]


    Parallel DSMC method using dynamic domain decomposition

    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 1 2005
    J.-S. Wu
    Abstract A general parallel direct simulation Monte Carlo method using unstructured mesh is introduced, which incorporates a multi-level graph-partitioning technique to dynamically decompose the computational domain. The current DSMC method is implemented on an unstructured mesh using particle ray-tracing technique, which takes the advantages of the cell connectivity information. In addition, various strategies applying the stop at rise (SAR) (IEEE Trans Comput 1988; 39:1073,1087) scheme is studied to determine how frequent the domain should be re-decomposed. A high-speed, bottom-driven cavity flow, including small, medium and large problems, based on the number of particles and cells, are simulated. Corresponding analysis of parallel performance is reported on IBM-SP2 parallel machine up to 64 processors. Analysis shows that degree of imbalance among processors with dynamic load balancing is about ,,½ of that without dynamic load balancing. Detailed time analysis shows that degree of imbalance levels off very rapidly at a relatively low value with increasing number of processors when applying dynamic load balancing, which makes the large problem size fairly scalable for processors more than 64. In general, optimal frequency of activating SAR scheme decreases with problem size. At the end, the method is applied to compute two two-dimensional hypersonic flows, a three-dimensional hypersonic flow and a three-dimensional near-continuum twin-jet gas flow to demonstrate its superior computational capability and compare with experimental data and previous simulation data wherever available. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    A parallel cell-based DSMC method on unstructured adaptive meshes

    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 12 2004
    Min Gyu Kim
    Abstract A parallel DSMC method based on a cell-based data structure is developed for the efficient simulation of rarefied gas flows on PC-clusters. Parallel computation is made by decomposing the computational domain into several subdomains. Dynamic load balancing between processors is achieved based on the number of simulation particles and the number of cells allocated in each subdomain. Adjustment of cell size is also made through mesh adaptation for the improvement of solution accuracy and the efficient usage of meshes. Applications were made for a two-dimensional supersonic leading-edge flow, the axi-symmetric Rothe's nozzle, and the open hollow cylinder flare flow for validation. It was found that the present method is an efficient tool for the simulation of rarefied gas flows on PC-based parallel machines. Copyright © 2004 John Wiley & Sons, Ltd. [source]


    Error-aware and energy-efficient routing approach in MANETs

    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 1 2009
    Liansheng Tan
    Abstract The lifetime of a network is the key design factor of mobile ad hoc networks (MANETs). To prolong the lifetime of MANETs, one is forced to attain a tradeoff of minimizing the energy consumption and load balancing. In MANETs, energy waste resulting from retransmission due to high bit error rate (BER) and high frame error rate (FER) of wireless channel is significant. In this paper, we propose two novel protocols termed multi-threshold routing protocol (MTRP) and enhanced multi-threshold routing protocol (EMTRP). MTRP divides the total energy of a wireless node into multiple ranges. The lower bound of each range corresponds to a threshold. The protocol iterates from the highest threshold to the lowest one and chooses those routes with bottleneck energy being larger than the current threshold during each iteration. This approach thus avoids overusing certain routes and achieves load balancing. If multiple routes satisfy the threshold constraint, MTRP selects a route with the smallest hop count to further attain energy efficiency. Based on MTRP, EMTRP further takes channel condition into consideration and selects routes with better channel condition and consequently reduces the number of retransmissions and saves energy. We analyze the average loss probability (ALP) of the uniform error model and Gilbert error model and give a distributed algorithm to obtain the maximal ALP along a route. Descriptions of MTRP and EMTRP are given in pseudocode form. Simulation results demonstrate that our proposed EMTRP outperforms the representative protocol CMMBCR in terms of total energy consumption and load balancing. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    Energy-efficient target detection in sensor networks using line proxies

    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 3 2008
    Jangwon Lee
    Abstract One of the fundamental and important operations in sensor networks is sink,source matching, i.e. target detection. Target detection is about how a sink finds the location of source nodes observing the event of interest (i.e. target activity). This operation is very important in many sensor network applications such as military battlefield and environment habitats. The mobility of both targets and sinks brings significant challenge to target detection in sensor networks. Most existing approaches are either energy inefficient or lack of fault tolerance in the environment of mobile targets and mobile sinks. Motivated by these, we propose an energy-efficient line proxy target detection (LPTD) approach in this paper. The basic idea of LPTD is to use designated line proxies as rendezvous points (or agents) to coordinate mobile sinks and mobile targets. Instead of having rendezvous nodes for each target type as used by most existing approaches, we adopt the temporal-based hash function to determine the line in the given time. Then the lines are alternated over time in the entire sensor network. This simple temporal-based line rotation idea allows all sensor nodes in the network to serve as rendezvous points and achieves overall load balancing. Furthermore, instead of network-wide flooding, interests from sinks will be flooded only to designated line proxies within limited area. The interest flooding can further decrease if the interest has geographical constraints. We have conducted extensive analysis and simulations to evaluate the performance of our proposed approach. Our results show that the proposed approach can significantly reduce overall energy consumption and target detection delay. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Joint packet scheduling and dynamic base station assignment for CDMA data networks

    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 2 2008
    Christian Makaya
    Abstract In current code division multiple access (CDMA) based wireless systems, a base station (BS) schedules packets independently of its neighbours, which may lead to resource wastage and the degradation of the system's performance. In wireless networks, in order to achieve an efficient packet scheduling, there are two conflicting performance metrics that have to be optimized: throughput and fairness. Their maximization is a key goal, particularly in next-generation wireless networks. This paper proposes joint packet scheduling and BS assignment schemes for a cluster of interdependent neighbouring BSs in CDMA-based wireless networks, in order to enhance the system performance through dynamic load balancing. The proposed schemes are based on sector subdivision in terms of average required resource per mobile station and utility function approach. The fairness is achieved by minimizing the variance of the delay for the remaining head-of-queue packets. Inter-cell and intra-cell interferences from scheduled packets are also minimized in order to increase the system capacity and performance. The simulation results show that our proposed schemes perform better than existing schemes available in the open literature. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Pre-handover signalling for QoS aware mobility management

    INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, Issue 6 2004
    Hakima Chaouchi
    In this paper we present a new approach to provide fast handover in Mobile IP. A new Pre-Handover Signalling (PHS) protocol is proposed to allow the network to achieve accurate handover decisions considering different constraints such as QoS, load balancing in the base stations, the user profile, the mobile node service requirements, etc. In addition we propose to minimize the time discovery of the new base station in order to minimize the handover latency. We propose to start the PHS as soon as the mobile node crosses a predefined critical zone area in its current location, this signalling will provide a list of candidate cells to the mobile node with corresponding priorities; the mobile node will select the highest priority base station as soon as the layer two handover occurs. We propose in the current work to use an extension of COPS (Common Open Policy Service) to support the proposed PHS mechanism and overcome the blind handover decisions of Mobile IP and improve the handover performance.,Copyright © 2004 John Wiley & Sons, Ltd. [source]


    An adaptive load balancing scheme for web servers

    INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, Issue 1 2002
    Dr. James Aweya
    This paper describes an overload control scheme for web servers which integrates admission control and load balancing. The admission control mechanism adaptively determines the client request acceptance rate to meet the web servers' performance requirements while the load balancing or client request distribution mechanism determines the fraction of requests to be assigned to each web server. The scheme requires no prior knowledge of the relative speeds of the web servers, nor the work required to process each incoming request. Copyright © 2002 John Wiley & Sons, Ltd. [source]