Markov Decision Process (markov + decision_process)

Distribution by Scientific Domains


Selected Abstracts


CAC and routing for multi-service networks with blocked wide-band calls delayed, Part II: approximative link MDP framework

EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, Issue 1 2007
Ernst Nordström
In this paper, we study the call admission control (CAC) and routing issue in multi-service networks. Two categories of calls are considered: a narrow-band with blocked calls cleared and a wide-band with blocked calls delayed. The optimisation is subject to several quality of service (QoS) constraints, either on the packet or call level. The objective function is formulated as reward maximisation with penalty for delay. A suboptimal solution is achieved by applying Markov decision process (MDP) theory together with a three-level approximation. First, the network is decomposed into a set of links assumed to have independent Markov and reward processes respectively. Second, the dimensions of the link Markov and reward processes are reduced by aggregation of the call classes into call categories. Third, by applying decomposition of the link Markov process, the link MDP tasks are simplified considerably. The CAC and routing policy is computed by the policy iteration algorithm from MDP theory. The numerical results show that the proposed CAC and routing method, based on the approximate link MDP framework, is able to find an efficient trade-off between reward loss and average call set-up delay, outperforming conventional methods such as least loaded routing (LLR). Copyright © 2006 AEIT. [source]


CAC and routing for multi-service networks with blocked wide-band calls delayed, part I: exact link MDP framework

EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, Issue 1 2006
Ernst Nordström
In this paper, we study the call admission control (CAC) and routing issue in multi-service networks. Two categories of calls are considered: a narrow-band (NB) with blocked calls cleared and a wide-band (WB) with blocked calls delayed. The objective function is formulated as reward maximisation with penalty for delay. The optimisation is subject to quality of service (QoS) constraints and, possibly, grade of service (GoS) constraints. A suboptimal solution is achieved by applying Markov decision process (MDP) theory together with a two-level approximation. First, the network is decomposed into a set of links assumed to have independent Markov and reward processes respectively. Second, the dimensions of the link Markov and reward processes are reduced by aggregation of the call classes into call categories. The CAC and routing policy is computed by the policy iteration algorithm from MDP theory. The numerical results show that the proposed CAC and routing method, based on the exact link MDP framework, is able to find an efficient trade-off between reward loss and average call set-up delay, outperforming conventional methods such as the least loaded routing (LLR). Copyright © 2005 AEIT. [source]


An extortionary guerrilla movement

JOURNAL OF APPLIED ECONOMETRICS, Issue 6 2007
Norman Offstein
This paper models an extortionary relationship between a pipeline operator and a guerrilla movement. Payment and attack decisions are modeled as an infinite-horizon Markov decision process, where each period the oil company chooses to pay or not pay an extortion demand and the movement decides to attack or not. Decisions depend on the level of single-period payoff and discounted expected future payoffs. We estimate the model with pipeline attack data and compare parameters when the discount factor is changed. We reject a zero discount factor hypothesis, demonstrating that the movement's observed attack pattern is compatible with extortionary behavior. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Optimal control of a production-inventory system with both backorders and lost sales

NAVAL RESEARCH LOGISTICS: AN INTERNATIONAL JOURNAL, Issue 3 2010
Saif Benjaafar
Abstract We consider the optimal control of a production inventory-system with a single product and two customer classes where items are produced one unit at a time. Upon arrival, customer orders can be fulfilled from existing inventory, if there is any, backordered, or rejected. The two classes are differentiated by their backorder and lost sales costs. At each decision epoch, we must determine whether or not to produce an item and if so, whether to use this item to increase inventory or to reduce backlog. At each decision epoch, we must also determine whether or not to satisfy demand from a particular class (should one arise), backorder it, or reject it. In doing so, we must balance inventory holding costs against the costs of backordering and lost sales. We formulate the problem as a Markov decision process and use it to characterize the structure of the optimal policy. We show that the optimal policy can be described by three state-dependent thresholds: a production base-stock level and two order-admission levels, one for each class. The production base-stock level determines when production takes place and how to allocate items that are produced. This base-stock level also determines when orders from the class with the lower shortage costs (Class 2) are backordered and not fulfilled from inventory. The order-admission levels determine when orders should be rejected. We show that the threshold levels are monotonic (either nonincreasing or nondecreasing) in the backorder level of Class 2. We also characterize analytically the sensitivity of these thresholds to the various cost parameters. Using numerical results, we compare the performance of the optimal policy against several heuristics and show that those that do not allow for the possibility of both backordering and rejecting orders can perform poorly.© 2010 Wiley Periodicals, Inc. Naval Research Logistics 2010 [source]


Dynamic server assignment policies for assembly-type queues with flexible servers

NAVAL RESEARCH LOGISTICS: AN INTERNATIONAL JOURNAL, Issue 3 2008
Yi-Chun Tsai
Abstract We seek dynamic server assignment policies in finite-capacity queueing systems with flexible and collaborative servers, which involve an assembly and/or a disassembly operation. The objective is to maximize the steady-state throughput. We completely characterize the optimal policy for a Markovian system with two servers, two feeder stations, and instantaneous assembly and disassembly operations. This optimal policy allocates one server per station unless one of the stations is blocked, in which case both servers work at the unblocked station. For Markovian systems with three stations and instantaneous assembly and/or disassembly operations, we consider similar policies that move a server away from his/her "primary" station only when that station is blocked or starving. We determine the optimal assignment of each server whose primary station is blocked or starving in systems with three stations and zero buffers, by formulating the problem as a Markov decision process. Using this optimal assignment, we develop heuristic policies for systems with three or more stations and positive buffers, and show by means of a numerical study that these policies provide near-optimal throughput. Furthermore, our numerical study shows that these policies developed for assembly-type systems also work well in tandem systems. © 2008 Wiley Periodicals, Inc. Naval Research Logistics, 2008 [source]


An optimal investment and consumption model with stochastic returns

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 1 2009
Xikui Wang
Abstract We consider a financial market consisting of a risky asset and a riskless one, with a constant or random investment horizon. The interest rate from the riskless asset is constant, but the relative return rate from the risky asset is stochastic with an unknown parameter in its distribution. Following the Bayesian approach, the optimal investment and consumption problem is formulated as a Markov decision process. We incorporate the concept of risk aversion into the model and characterize the optimal strategies for both the power and logarithmic utility functions with a constant relative risk aversion (CRRA). Numerical examples are provided that support the intuition that a higher proportion of investment should be allocated to the risky asset if the mean return rate on the risky asset is higher or the risky asset return rate is less volatile. Copyright © 2008 John Wiley & Sons, Ltd. [source]