Awesome Reinforcement Learning

Awesome Reinforcement Learning

A curated list of resources dedicated to reinforcement learning.

We have pages for other topics: awesome-rnnawesome-deep-visionawesome-random-forest

Maintainers: Hyunsoo KimJiwon Kim

We are looking for more contributors and maintainers!

Contributing

Please feel free to pull requests

Table of Contents

Codes

Theory

Lectures

Books

  • Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction [Book] [Code]
  • Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
  • David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]
  • Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)] [Summary]
  • Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]

Surveys

  • Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey, JAIR, 1996. [Paper]
  • S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]
  • Matthew E. Taylor, Peter Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR, 2009. [Paper]
  • Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]
  • Michael L. Littman, "Reinforcement learning improves behaviour from evaluative feedback." Nature 521.7553 (2015): 445-451. [Paper]
  • Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, 2014. [Book]

Papers / Thesis

  • Foundational Papers

    • Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper]

      • discusses issues in RL such as the "credit assignment problem"
    • Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper]
      • earliest publication on temporal-difference (TD) learning rule.
  • Methods
    • Dynamic Programming (DP):

      • Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
    • Monte Carlo:
      • Andrew Barto, Michael Duff, Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper]
      • Satinder P. Singh, Richard S. Sutton, Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper]
    • Temporal-Difference:
      • Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988.[Paper]
    • Q-Learning (Off-policy TD algorithm):
      • Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
    • Sarsa (On-policy TD algorithm):
      • G.A. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report]
      • Richard S. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper]
    • R-Learning (learning of relative values)
      • Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993.[Paper-Google Scholar]
    • Function Approximation methods (Least-Sqaure Temporal Difference, Least-Sqaure Policy Iteration)
      • Steven J. Bradtke, Andrew G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
      • Michail G. Lagoudakis, Ronald Parr, Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]
    • Policy Search / Policy Gradient
      • Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
      • Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005. [Paper]
      • Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
      • Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper]
      • Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012.[Paper]
      • Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004.[Paper]
      • Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
      • Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
    • Hierarchical RL
      • Richard Sutton, Doina Precup, Satinder Singh, Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
      • George Konidaris, Andrew Barto, Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007.[Paper]
    • Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)
      • V. Mnih, et. al., Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
      • Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]
      • Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
      • Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015.[ArXiv]
      • Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]
      • Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016.[ArXiv]

Applications

Game Playing

  • Traditional Games

    • Backgammon - "TD-Gammon" game play using TD(λ) (Tesauro, ACM 1995) [Paper]
    • Chess - "KnightCap" program using TD(λ) (Baxter, arXiv 1999) [arXiv]
    • Chess - Giraffe: Using deep reinforcement learning to play chess (Lai, arXiv 2015) [arXiv]
  • Computer Games

Robotics

  • Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
  • Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
  • Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
  • Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
  • PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
  • Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
  • Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]

Control

  • An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper] [Video]
  • Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2011) [Paper]

Operations Research

  • Scaling Average-reward Reinforcement Learning for Product Delivery (Proper, AAAI 2004) [Paper]
  • Cross Channel Optimized Marketing by Reinforcement Learning (Abe, KDD 2004) [Paper]

Human Computer Interaction

  • Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (Singh, JAIR 2002)[Paper]

Tutorials / Websites

Online Demos

时间: 2024-10-25 09:30:09

Awesome Reinforcement Learning的相关文章

看DeepMind如何用Reinforcement learning玩游戏

引子 说到机器学习最酷的分支,非Deep learning和Reinforcement learning莫属(以下分别简称DL和RL).这两者不仅在实际应用中表现的很酷,在机器学习理论中也有不俗的表现.DeepMind工作人员合两者之精髓,在Stella模拟机上让机器自己玩了7个Atari 2600的游戏,结果是玩的冲出美洲,走向世界,超越了物种的局限.不仅战胜了其他机器人,甚至在其中3个游戏中超越了人类游戏专家.噢,忘记说了,Atari 2600是80年代风靡美国的游戏机,当然你现在肯定不会喜

Reinforcement Learning in Continuous State and Action Spaces: A Brief Note

Thanks Hado van Hasselt for the great work. Introduction In the problems of sequential decision making in continuous domains with delayed reward signals, the main purpose for the algorithms is to learn how to choose actions from an infinitely large a

Generating Text with Deep Reinforcement Learning

上一篇介绍了DQN在文字游戏中的应用,本文将分享一篇DQN在文本生成中的应用,将一个领域的知识迁移到其他领域应用的时候,都需要做概念上的等效替换,比如context可以替换为state,被预测的word可以替换为action.本文分享的题目是Generating Text with Deep Reinforcement Learning,作者是来自National Research Council of Canada的Hongyu Guo研究员,文章最早于2015年10月30日submit在ar

Deep Reinforcement Learning for Dialogue Generation

本文将会分享一篇深度增强学习在bot中应用的文章,增强学习在很早的时候就应用于bot中来解决一些实际问题,最近几年开始流行深度增强学习,本文作者将其引入到最新的bot问题中.paper的题目是Deep Reinforcement Learning for Dialogue Generation,作者是Jiwei Li,最早于2016年6月10日发在arxiv上. 现在学术界中bot领域流行的解决方案是seq2seq,本文针对这种方案抛出两个问题: 1.用MLE作为目标函数会导致容易生成类似于"呵

Deep Reinforcement Learning with a Natural Language Action Space

本文继续分享一篇深度增强学习在NLP中应用的paper,题目是Deep Reinforcement Learning with a Natural Language Action Space,作者是来自微软的Ji He博士,文章最早于2015年11月发在arxiv上,2016年6月8号update. 通过前两篇文章的介绍,基本对DQN在NLP中应用有了一个清晰的认识,与DQN之前应用不同的地方在于两个方面: 1.actions的量级很大. 2.transition tuple的具体形式随着模型来

论文笔记之:Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning

论文笔记之:Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning  2017-06-06  21:43:53    这篇文章的 Motivation 来自于 MDNet:    本文所提出的 framework 为:                             

论文笔记之:Dueling Network Architectures for Deep Reinforcement Learning

  Dueling Network Architectures for Deep Reinforcement Learning ICML 2016 Best Paper    摘要:本文的贡献点主要是在 DQN 网络结构上,将卷积神经网络提出的特征,分为两路走,即:the state value function 和 the state-dependent action advantage function.  这个设计的主要特色在于 generalize learning across act

(转) Deep Learning in a Nutshell: Reinforcement Learning

  Deep Learning in a Nutshell: Reinforcement Learning   Share: Posted on September 8, 2016 by Tim Dettmers No CommentsTagged Deep Learning, Deep Neural Networks, Machine Learning,Reinforcement Learning This post is Part 4 of the Deep Learning in a Nu

(转) Deep Reinforcement Learning: Playing a Racing Game

Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playing Out Run, session 201609171218_175epsNo time limit, no traffic, 2X time lapse Above is the built deep Q-network (DQN) agent playing Out Run, trained

论文阅读之: Hierarchical Object Detection with Deep Reinforcement Learning

  Hierarchical Object Detection with Deep Reinforcement Learning NIPS 2016 WorkShop    Paper : https://arxiv.org/pdf/1611.03718v1.pdf Project Page : https://github.com/imatge-upc/detection-2016-nipsws   摘要: 我们提出一种基于深度强化学习的等级物体检测方法 (Hierarchical Objec