Performance Bottleneck (performance + bottleneck)

Distribution by Scientific Domains


Selected Abstracts


Specification and detection of performance problems with ASL

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 11 2007
Michael Gerndt
Abstract Performance analysis is an important step in tuning performance-critical applications. It is a cyclic process of measuring and analyzing performance data, driven by the programmer's hypotheses on potential performance problems. Currently this process is controlled manually by the programmer. The goal of the work described in this article is to automate the performance analysis process based on a formal specification of performance properties. One result of the APART project is the APART Specification Language (ASL) for the formal specification of performance properties. Performance bottlenecks can then be identified based on the specification, since bottlenecks are viewed as performance properties with a large negative impact. We also present the overall design and an initial evaluation of the Periscope system which utilizes ASL specifications to automatically search for performance bottlenecks in a distributed manner. Copyright 2006 John Wiley & Sons, Ltd. [source]


APEX-Map: a parameterized scalable memory access probe for high-performance computing systems,

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 17 2007
Erich Strohmaier
Abstract The memory wall between the peak performance of microprocessors and their memory performance has become the prominent performance bottleneck for many scientific application codes. New benchmarks measuring data access speeds locally and globally in a variety of different ways are needed to explore the ever increasing diversity of architectures for high-performance computing. In this paper, we introduce a novel benchmark, APEX-Map, which focuses on global data movement and measures how fast global data can be fed into computational units. APEX-Map is a parameterized, synthetic performance probe and integrates concepts for temporal and spatial locality into its design. Our first parallel implementation in MPI and various results obtained with it are discussed in detail. By measuring the APEX-Map performance with parameter sweeps for a whole range of temporal and spatial localities performance surfaces can be generated. These surfaces are ideally suited to study the characteristics of the computational platforms and are useful for performance comparison. Results on a global-memory vector platform and distributed-memory superscalar platforms clearly reflect the design differences between these different architectures. Published in 2007 by John Wiley & Sons, Ltd. [source]


Potential performance bottleneck in Linux TCP

INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 11 2007
Wenji Wu
Abstract Transmission control protocol (TCP) is the most widely used transport protocol on the Internet today. Over the years, especially recently, due to requirements of high bandwidth transmission, various approaches have been proposed to improve TCP performance. The Linux 2.6 kernel is now preemptible. It can be interrupted mid-task, making the system more responsive and interactive. However, we have noticed that Linux kernel preemption can interact badly with the performance of the networking subsystem. In this paper, we investigate the performance bottleneck in Linux TCP. We systematically describe the trip of a TCP packet from its ingress into a Linux network end system to its final delivery to the application; we study the performance bottleneck in Linux TCP through mathematical modelling and practical experiments; finally, we propose and test one possible solution to resolve this performance bottleneck in Linux TCP. Copyright 2007 John Wiley & Sons, Ltd. [source]


Enhancing multimedia streaming over existing wireless LAN technology using the Unified Link Layer API

INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, Issue 5 2007
Tim Farnham
This paper examines how multimedia streaming scenarios can be enhanced by cross-layer interaction, and in particular link performance information and configuration options provided by the recently developed Unified Link Layer API (ULLA). It provides results of an experimental implementation developed for this purpose in a wireless LAN (WLAN) environment. Multimedia streaming is an application that is gaining in popularity for mobile devices and in particular mobile Internet-based content broadcasting is rapidly emerging as a key feature on mobile devices. In these scenarios, the wireless link (last hop) is normally the performance bottleneck due to the dynamic and limited capacity of the wireless medium. The use of ULLA in this context can provide the ability to tailor the video transmission to the wireless link performance and also to configure the links in response to performance problems or environmental changes. For this purpose the focus of multimedia streaming has been on WLAN link technology and dynamic adaptation (i.e., dynamic channel selection and video transcoding) using a dynamic resource reservation overlay protocol. Copyright 2007 John Wiley & Sons, Ltd. [source]


Scalable and lightweight key distribution for secure group communications

INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, Issue 3 2004
Fu-Yuan Lee
Securing group communications in dynamic and large-scale groups is more complex than securing one-to-one communications due to the inherent scalability issue of group key management. In particular, cost for key establishment and key renewing is usually relevant to the group size and subsequently becomes a performance bottleneck in achieving scalability. To address this problem, this paper proposes a new approach that features decoupling of group size and computation cost for group key management. By using a hierarchical key distribution architecture and load sharing, the load of key management can be shared by a cluster of third parties without revealing group messages to them. The proposed scheme provides better scalability because the cost for key management of each component is independent of the group size. Specifically, our scheme incurs constant computation and communication overheads for key renewing. In this paper, we present the detailed design of the proposed scheme and performance comparisons with other schemes. Briefly, our scheme provides better scalability than existing group key distribution approaches.,Copyright 2004 John Wiley & Sons, Ltd. [source]


Compiler and runtime techniques for software transactional memory optimization

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 1 2009
Peng Wu
Abstract Software transactional memory (STM) systems are an attractive environment to evaluate optimistic concurrency. We describe our experience of supporting and optimizing an STM system at both the managed runtime and compiler levels. We describe the design policies of our STM system and the statistics collected by the runtime to identify performance bottlenecks and guide tuning decisions. We present an initial work on supporting automatic instrumentation of the STM primitives for C/C++ and Java programs in the IBM XL compiler and J9 Java virtual machine. We evaluate and discuss the performance of several transactional programs running on our system. Copyright 2008 John Wiley & Sons, Ltd. [source]


Specification and detection of performance problems with ASL

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 11 2007
Michael Gerndt
Abstract Performance analysis is an important step in tuning performance-critical applications. It is a cyclic process of measuring and analyzing performance data, driven by the programmer's hypotheses on potential performance problems. Currently this process is controlled manually by the programmer. The goal of the work described in this article is to automate the performance analysis process based on a formal specification of performance properties. One result of the APART project is the APART Specification Language (ASL) for the formal specification of performance properties. Performance bottlenecks can then be identified based on the specification, since bottlenecks are viewed as performance properties with a large negative impact. We also present the overall design and an initial evaluation of the Periscope system which utilizes ASL specifications to automatically search for performance bottlenecks in a distributed manner. Copyright 2006 John Wiley & Sons, Ltd. [source]


Dynamic zone topology routing protocol for MANETs

EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, Issue 4 2007
Mehran Abolhasan
The limited scalability of the proactive and reactive routing protocols have resulted in the introduction of new generation of routing in mobile ad hoc networks, called hybrid routing. These protocols aim to extend the scalability of such networks beyond several hundred to thousand of nodes by defining a virtual infrastructure in the network. However, many of the hybrid routing protocols proposed to date are designed to function using a common pre-programmed static zone map. Other hybrid protocols reduce flooding by grouping nodes into clusters, governed by a cluster-head, which may create performance bottlenecks or a single point of failures at each cluster-head node. We propose a new routing strategy in which zones are created dynamically, using a dynamic zone creation algorithm. Therefore, nodes are not restricted to a specific region. Additionally, nodes perform routing and data forwarding in a cooperative manner, which means that in the case failure, route recalculation is minimised. Routing overheads are also further reduced by introducing a number of GPS-based location tracking mechanisms, which reduces the route discovery area and the number of nodes queried to find the required destination. Copyright 2006 AEIT [source]


An optimal spectrum-balancing algorithm for digital subscriber lines based on particle swarm optimization

INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 9 2008
Meiqin Tang
Abstract This paper presents a new algorithm for optimal spectrum balancing in modern digital subscriber line (DSL) systems using particle swarm optimization (PSO). In DSL, crosstalk is one of the major performance bottlenecks, therefore various dynamic spectrum management algorithms have been proposed to reduce excess crosstalks among users by dynamically optimizing transmission power spectra. In fact, the objective function in the spectrum optimization problem is always nonconcave. PSO is a new evolution algorithm based on the movement and intelligence of swarms looking for the most fertile feeding location, which can solve discontinuous, nonconvex and nonlinear problems efficiently. The proposed algorithm optimizes the weighted rate sum. These weights allow the system operator to place differing qualities of service or importance levels on each user, which makes it possible for the system to avoid the selfish-optimum. We can show that the proposed algorithm converges to the global optimal solutions. Simulation results demonstrate that our algorithm can guarantee fast convergence within a few iterations and solve the nonconvex optimization problems efficiently. Copyright 2008 John Wiley & Sons, Ltd. [source]