Home About us Contact

Reinforcement Learning (reinforcement + learning)

Distribution by Scientific Domains

Business, Economics, Finance and Accounting	27%
Life Sciences	22%
Information Science and Computing	18%
Engineering	18%

Selected Abstracts

SOLVING DYNAMIC WILDLIFE RESOURCE OPTIMIZATION PROBLEMS USING REINFORCEMENT LEARNING

NATURAL RESOURCE MODELING, Issue 1 2005
CHRISTOPHER J. FONNESBECK
ABSTRACT. An important technical component of natural resource management, particularly in an adaptive management context, is optimization. This is used to select the most appropriate management strategy, given a model of the system and all relevant available information. For dynamic resource systems, dynamic programming has been the de facto standard for deriving optimal state-specific management strategies. Though effective for small-dimension problems, dynamic programming is incapable of providing solutions to larger problems, even with modern microcomputing technology. Reinforcement learning is an alternative, related procedure for deriving optimal management strategies, based on stochastic approximation. It is an iterative process that improves estimates of the value of state-specific actions based in interactions with a system, or model thereof. Applications of reinforcement learning in the field of artificial intelligence have illustrated its ability to yield near-optimal strategies for very complex model systems, highlighting the potential utility of this method for ecological and natural resource management problems, which tend to be of high dimension. I describe the concept of reinforcement learning and its approach of estimating optimal strategies by temporal difference learning. I then illustrate the application of this method using a simple, well-known case study of Anderson [1975], and compare the reinforcement learning results with those of dynamic programming. Though a globally-optimal strategy is not discovered, it performs very well relative to the dynamic programming strategy, based on simulated cumulative objective return. I suggest that reinforcement learning be applied to relatively complex problems where an approximate solution to a realistic model is preferable to an exact answer to an oversimplified model. [source]

Can Traditions Emerge from the Interaction of Stimulus Enhancement and Reinforcement Learning?

AMERICAN ANTHROPOLOGIST, Issue 2 2010
An Experimental Model
ABSTRACT, The study of social learning in captivity and behavioral traditions in the wild are two burgeoning areas of research, but few empirical studies have tested how learning mechanisms produce emergent patterns of tradition. Studies have examined how social learning mechanisms that are cognitively complex and possessed by few species, such as imitation, result in traditional patterns, yet traditional patterns are also exhibited by species that may not possess such mechanisms. We propose an explicit model of how stimulus enhancement and reinforcement learning could interact to produce traditions. We tested the model experimentally with tufted capuchin monkeys (Cebus apella), which exhibit traditions in the wild but have rarely demonstrated imitative abilities in captive experiments. Monkeys showed both stimulus enhancement learning and a habitual bias to perform whichever behavior first obtained them a reward. These results support our model that simple social learning mechanisms combined with reinforcement can result in traditional patterns of behavior. RÉSUMÉ, L'étude de l'apprentissage social en captivité et les traditions behavioristes à l'état sauvage sont deux domaines de recherche en plein essor, mais peu d'études empiriques ont mis à l'essai comment les mécanismes de l'apprentissage produisent des schémas émergents de tradition. Des études ont examiné comment les mécanismes de l'apprentissage social qui sont d'une complèxité cognitive et qui sont possédés par peu d'espèces, telle que l'imitation, résultent en schémas traditionnels; cependant, les schémas traditionnels sont aussi exposés par des espèces qui ne possèdent peut-être pas tels mécanismes. Nous proposons un modèle explicite de la façon dont le stimulus renforcé et l'apprentissage de renforcement puisse réagir afin de produire des schémas traditionnels. Nous avons mis à l'essai le modèle avec des singes capucins touffus (Cebus apella), qui exhibent des traditions à l'état sauvage, mais qui ont rarement démontré des aptitudes imitatives dans les expériences en captivité. Les singes ont montré aussi bien l'apprentissage de stimulus renforcé qu'une tendance habituelle à exécuter n'importe quelle manière d'agir qui leur a premièrement rapporté une récompense. Ces résultats soutiennent notre modèle, que les mécanismes simples de l'apprentissage social combinés avec le renforcement peuvent résulter en schémas behavioristes traditionnels. ZUSAMMENFASSUNG, Die Studie des sozialen Lernens durch Laborversuche und Freilandstudien von Verhaltenstraditionen sind zwei weit verbreitete Forschungsgebiete, aber nur wenige empirische Studien haben geprüft, wie Lernmechanismen traditionelle Verhaltensmuster hervorrufen können. Studien haben überprüft, wie kognitiv komplexe soziale Lernmechanismen wie etwa Imitation, die nur wenige Tierarten aufweisen, Verhaltenstraditionen hervorrufen können, dennoch werden Verhaltenstraditionen auch bei Tierarten gesehen, die solch komplexe Mechanismen wahrscheinlich nicht besitzen. Wir beschreiben ein detailliertes Modell, in dem eine Wechselwirkung von Reizverstärkung und verstärkendem Lernen traditionelles Verhalten erwirken kann. Wir testeten unser Modell mit Gehaubten Kapuzinern (Cebus apella), die Traditionen in freier Wildbahn aufweisen, aber nur selten Imitationsfähigkeiten in Laborexperimenten gezeigt haben. Die Affen zeigten Lernen durch Reizverstärkung und eine Gewohnheitstendenz die Verhaltensvariante durchzuführen, die ihnen zuerst dazu verhalf ein Stück Futter zu erhalten. Diese Ergebnisse sind mit unserem Modell in Einklang und unterstützen die Ansicht, dass einfache soziale Lernmechanismen kombiniert mit verstärkendem Lernen zu traditionellen Verhaltensmustern führen können. [source]

Fuzzy Sarsa Learning and the proof of existence of its stationary points,

ASIAN JOURNAL OF CONTROL, Issue 5 2008
Vali Derhami
Abstract This paper provides a new Fuzzy Reinforcement Learning (FRL) algorithm based on critic-only architecture. The proposed algorithm, called Fuzzy Sarsa Learning (FSL), tunes the parameters of conclusion parts of the Fuzzy Inference System (FIS) online. Our FSL is based on Sarsa, which approximates the Action Value Function (AVF) and is an on-policy method. In each rule, actions are selected according to the proposed modified Softmax action selection so that the final inferred action selection probability in FSL is equivalent to the standard Softmax formula. We prove the existence of fixed points for the proposed Approximate Action Value Iteration (AAVI). Then, we show that FSL satisfies the necessary conditions that guarantee the existence of stationary points for it, which coincide with the fixed points of the AAVI. We prove that the weight vector of FSL with stationary action selection policy converges to a unique value. We also compare by simulation the performance of FSL and Fuzzy Q-Learning (FQL) in terms of learning speed, and action quality. Moreover, we show by another example the convergence of FSL and the divergence of FQL when both algorithms use a stationary policy. Copyright © 2008 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society [source]

Two Competing Models of How People Learn in Games

ECONOMETRICA, Issue 6 2002
Ed Hopkins
Reinforcement learning and stochastic fictitious play are apparent rivals as models of human learning. They embody quite different assumptions about the processing of information and optimization. This paper compares their properties and finds that they are far more similar than were thought. In particular, the expected motion of stochastic fictitious play and reinforcement learning with experimentation can both be written as a perturbed form of the evolutionary replicator dynamics. Therefore they will in many cases have the same asymptotic behavior. In particular, local stability of mixed equilibria under stochastic fictitious play implies local stability under perturbed reinforcement learning. The main identifiable difference between the two models is speed: stochastic fictitious play gives rise to faster learning. [source]

Learning scheduling control knowledge through reinforcements

INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, Issue 2 2000
K. Miyashita
Abstract This paper introduces a method of learning search control knowledge in schedule optimization problems through application of reinforcement learning. Reinforcement learning is an effective approach for the problem faced by the agent that learns its behavior through trial-and-error interactions with a dynamic environment. Nevertheless, reinforcement learning has a difficulty of slow convergence when applied to the problems with a large state space. The paper discusses the case-based function approximation technique, which makes reinforcement learning applicable to the large scale problems such as a job-shop scheduling problem. To show effectiveness of the approach, reinforcement learning is applied to acquire search control knowledge in repair-based schedule optimization process. Preliminary experiment results show that repair-action selection made by learned search control knowledge succeeded in improving scheduling quality efficiently. [source]

SOLVING DYNAMIC WILDLIFE RESOURCE OPTIMIZATION PROBLEMS USING REINFORCEMENT LEARNING

Two Competing Models of How People Learn in Games

A brainlike learning system with supervised, unsupervised, and reinforcement learning

ELECTRICAL ENGINEERING IN JAPAN, Issue 1 2008
Takafumi Sasakawa
Abstract According to Hebb's cell assembly theory, the brain has the capability of function localization. On the other hand, it is suggested that in the brain there are three different learning paradigms: supervised, unsupervised, and reinforcement learning, which are related deeply to the three parts of brain: cerebellum, cerebral cortex, and basal ganglia, respectively. Inspired by the above knowledge of the brain in this paper we present a brainlike learning system consisting of three parts: supervised learning (SL) part, unsupervised learning (UL) part, and reinforcement learning (RL) part. The SL part is a main part learning input,output mapping; the UL part is a competitive network dividing input space into subspaces and realizes the capability of function localization by controlling firing strength of neurons in the SL part based on input patterns; the RL part is a reinforcement learning scheme, which optimizes system performance by adjusting the parameters in the UL part. Numerical simulations have been carried out and the simulation results confirm the effectiveness of the proposed brainlike learning system. © 2007 Wiley Periodicals, Inc. Electr Eng Jpn, 162(1): 32,39, 2008; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/eej.20600 [source]

Action control of autonomous agents in continuous valued space using RFCN

ELECTRONICS & COMMUNICATIONS IN JAPAN, Issue 2 2008
Shinichi Shirakawa
Abstract Researchers on action control of autonomous agents and multiple agents have attracted increasing attention in recent years. The general methods using action control of agents are neural network, genetic programming, and reinforcement learning. In this study, we use neural network for action control of autonomous agents. Our method determines the structure and parameter of neural network in evolution. We proposed Flexibly Connected Neural Network (FCN) previously as a method of constructing arbitrary neural networks with optimized structures and parameters to solve unknown problems. FCN was applied to action control of an autonomous agent and showed experimentally that it is effective for perceptual aliasing problems. All of the experiments of FCN, however, are only in grid space. In this paper, we propose a new method based on FCN which can decide correction action in real and continuous valued space. The proposed method, called Real-valued FCN (RFCN), optimizes input,output functions of each unit, parameters of the input,output functions and speed of each unit. In order to examine its effectiveness, we applied the proposed method to action control of an autonomous agent to solve continuous-valued maze problems. © 2008 Wiley Periodicals, Inc. Electron Comm Jpn, 91(2): 31,39, 2008; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/eej.10032 [source]

Tonically active neurons in the striatum differentiate between delivery and omission of expected reward in a probabilistic task context

EUROPEAN JOURNAL OF NEUROSCIENCE, Issue 3 2009
Paul Apicella
Abstract Tonically active neurons (TANs) in the primate striatum are responsive to rewarding stimuli and they are thought to be involved in the storage of stimulus,reward associations or habits. However, it is unclear whether these neurons may signal the difference between the prediction of reward and its actual outcome as a possible neuronal correlate of reward prediction errors at the striatal level. To address this question, we studied the activity of TANs from three monkeys trained in a classical conditioning task in which a liquid reward was preceded by a visual stimulus and reward probability was systematically varied between blocks of trials. The monkeys' ability to discriminate the conditions according to probability was assessed by monitoring their mouth movements during the stimulus,reward interval. We found that the typical TAN pause responses to the delivery of reward were markedly enhanced as the probability of reward decreased, whereas responses to the predictive stimulus were somewhat stronger for high reward probability. In addition, TAN responses to the omission of reward consisted of either decreases or increases in activity that became stronger with increasing reward probability. It therefore appears that one group of neurons differentially responded to reward delivery and reward omission with changes in activity into opposite directions, while another group responded in the same direction. These data indicate that only a subset of TANs could detect the extent to which reward occurs differently than predicted, thus contributing to the encoding of positive and negative reward prediction errors that is relevant to reinforcement learning. [source]

The impact of mineralocorticoid receptor ISO/VAL genotype (rs5522) and stress on reward learning

GENES, BRAIN AND BEHAVIOR, Issue 6 2010
R. Bogdan
Research suggests that stress disrupts reinforcement learning and induces anhedonia. The mineralocorticoid receptor (MR) determines the sensitivity of the stress response, and the missense iso/val polymorphism (Ile180Val, rs5522) of the MR gene (NR3C2) has been associated with enhanced physiological stress responses, elevated depressive symptoms and reduced cortisol-induced MR gene expression. The goal of these studies was to evaluate whether rs5522 genotype and stress independently and interactively influence reward learning. In study 1, participants (n = 174) completed a probabilistic reward task under baseline (i.e. no-stress) conditions. In study 2, participants (n = 53) completed the task during a stress (threat-of-shock) and no-stress condition. Reward learning, i.e. the ability to modulate behavior as a function of reinforcement history, was the main variable of interest. In study 1, in which participants were evaluated under no-stress conditions, reward learning was enhanced in val carriers. In study 2, participants developed a weaker response bias toward a more frequently rewarded stimulus under the stress relative to no-stress condition. Critically, stress-induced reward learning deficits were largest in val carriers. Although preliminary and in need of replication due to small sample size, findings indicate that psychiatrically healthy individuals carrying the MR val allele, gene, which has been recently linked to depression, showed a reduced ability to modulate behavior as a function of reward when facing an acute, uncontrollable stressor. Future studies are warranted to evaluate whether rs5522 genotype interacts with naturalistic stressors to increase the risk of depression and whether stress-induced anhedonia might moderate such risk. [source]

Predicting direction shifts on Canadian,US exchange rates with artificial neural networks

INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 2 2001
Jefferson T. Davis
The paper presents a variety of neural network models applied to Canadian,US exchange rate data. Networks such as backpropagation, modular, radial basis functions, linear vector quantization, fuzzy ARTMAP, and genetic reinforcement learning are examined. The purpose is to compare the performance of these networks for predicting direction (sign change) shifts in daily returns. For this classification problem, the neural nets proved superior to the naïve model, and most of the neural nets were slightly superior to the logistic model. Using multiple previous days' returns as inputs to train and test the backpropagation and logistic models resulted in no increased classification accuracy. The models were not able to detect a systematic affect of previous days' returns up to fifteen days prior to the prediction day that would increase model performance. Copyright © 2001 John Wiley & Sons, Ltd. [source]

Dynamic pricing based on asymmetric multiagent reinforcement learning

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 1 2006
Ville Könönen
A dynamic pricing problem is solved by using asymmetric multiagent reinforcement learning in this article. In the problem, there are two competing brokers that sell identical products to customers and compete on the basis of price. We model this dynamic pricing problem as a Markov game and solve it by using two different learning methods. The first method utilizes modified gradient descent in the parameter space of the value function approximator and the second method uses a direct gradient of the parameterized policy function. We present a brief literature survey of pricing models based on multiagent reinforcement learning, introduce the basic concepts of Markov games, and solve the problem by using proposed methods. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 73,98, 2006. [source]

Design of the fuzzy multiobjective controller based on the eligibility method

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 5 2003
Hwan-Chun Myung
A multiobjective control problem has been handled in many different ways such as fuzzy, neural network and reinforcement learning, etc. Among them, a reinforcement learning method solves a multiobjective control problem without any prior knowledge. In this article, a new reinforcement learning method for a multiobjective control problem is proposed in consideration of its convergence. The proposed method, in which objective eligibility is considered for handling multirewards, reformulates a multiobjective control problem in a form of a reinforcement learning problem under non-Markov environment. Using a similar relation to eligibility, the proposed method dealt with the previous research results of eligibility and was implemented with the concept of a decoupled fuzzy sliding mode control (DFSMC). © 2003 Wiley Periodicals, Inc. [source]

Call admission control in cellular networks: A reinforcement learning solution

INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, Issue 2 2004
Sidi-Mohammed Senouci
In this paper, we address the call admission control (CAC) problem in a cellular network that handles several classes of traffic with different resource requirements. The problem is formulated as a semi-Markov decision process (SMDP) problem. We use a real-time reinforcement learning (RL) [neuro-dynamic programming (NDP)] algorithm to construct a dynamic call admission control policy. We show that the policies obtained using our TQ-CAC and NQ-CAC algorithms, which are two different implementations of the RL algorithm, provide a good solution and are able to earn significantly higher revenues than classical solutions such as guard channel. A large number of experiments illustrates the robustness of our policies and shows how they improve quality of service (QoS) and reduce call-blocking probabilities of handoff calls even with variable traffic conditions.,Copyright © 2004 John Wiley & Sons, Ltd. [source]

Learning scheduling control knowledge through reinforcements

Scaling Up Learning Models in Public Good Games

JOURNAL OF PUBLIC ECONOMIC THEORY, Issue 2 2004
Jasmina Arifovic
We study three learning rules (reinforcement learning (RL), experience weighted attraction learning (EWA), and individual evolutionary learning (IEL)) and how they perform in three different Groves,Ledyard mechanisms. We are interested in how well these learning rules duplicate human behavior in repeated games with a continuum of strategies. We find that RL does not do well, IEL does significantly better, as does EWA, but only if given a small discretized strategy space. We identify four main features a learning rule should have in order to stack up against humans in a minimal competency test: (1) the use of hypotheticals to create history, (2) the ability to focus only on what is important, (3) the ability to forget history when it is no longer important, and (4) the ability to try new things. [source]

Can Traditions Emerge from the Interaction of Stimulus Enhancement and Reinforcement Learning?

SOLVING DYNAMIC WILDLIFE RESOURCE OPTIMIZATION PROBLEMS USING REINFORCEMENT LEARNING

Computational Models for the Combination of Advice and Individual Learning

COGNITIVE SCIENCE - A MULTIDISCIPLINARY JOURNAL, Issue 2 2009
Guido Biele
Abstract Decision making often takes place in social environments where other actors influence individuals' decisions. The present article examines how advice affects individual learning. Five social learning models combining advice and individual learning-four based on reinforcement learning and one on Bayesian learning-and one individual learning model are tested against each other. In two experiments, some participants received good or bad advice prior to a repeated multioption choice task. Receivers of advice adhered to the advice, so that good advice improved performance. The social learning models described the observed learning processes better than the individual learning model. Of the models tested, the best social learning model assumes that outcomes from recommended options are more positively evaluated than outcomes from nonrecommended options. This model correctly predicted that receivers first adhere to advice, then explore other options, and finally return to the recommended option. The model also predicted accurately that good advice has a stronger impact on learning than bad advice. One-time advice can have a long-lasting influence on learning by changing the subjective evaluation of outcomes of recommended options. [source]