QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Benoît Frénay; Marco Saerens

doi:10.1016/j.neucom.2008.12.022

QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Research output: Contribution to journal › Article › peer-review

Abstract

Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994.) uses this framework to model two-agent zero-sum problems and, within this context, proposes the minimax-Q algorithm. This paper reviews RL algorithms for two-player zero-sum Markov games and introduces a new, simple, fast, algorithm, called QL₂. QL₂ is compared to several standard algorithms (Q-learning, Minimax and minimax-Q) implemented with the Q ash library written in Python. The experiments show that QL₂ converges empirically to optimal mixed policies, as minimax-Q, but uses a surprisingly simple and cheap updating rule. © 2009 Elsevier B.V. All rights reserved.

Original language	English
Pages (from-to)	1494-1507
Number of pages	14
Journal	Neurocomputing
Volume	72
Issue number	7-9
DOIs	https://doi.org/10.1016/j.neucom.2008.12.022
Publication status	Published - 2008
Externally published	Yes

Keywords

Markov games
Multi-agent
Q-Learning
Reinforcement learning
Two-player zero-sum games

Access to Document

10.1016/j.neucom.2008.12.022

http://ac.els-cdn.com/S0925231209000150/1-s2.0-S0925231209000150-main.pdf?_tid=9fe6b61c-770c-11e5-a691-00000aab0f01&acdnat=1445333361_86844a8fd239ed8afe14f37792972385

Cite this

@article{78d499bc411b49fbb511d11120634e5c,

title = "QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games",

abstract = "Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994.) uses this framework to model two-agent zero-sum problems and, within this context, proposes the minimax-Q algorithm. This paper reviews RL algorithms for two-player zero-sum Markov games and introduces a new, simple, fast, algorithm, called QL2. QL2 is compared to several standard algorithms (Q-learning, Minimax and minimax-Q) implemented with the Q ash library written in Python. The experiments show that QL2 converges empirically to optimal mixed policies, as minimax-Q, but uses a surprisingly simple and cheap updating rule. {\textcopyright} 2009 Elsevier B.V. All rights reserved.",

keywords = "Markov games, Multi-agent, Q-Learning, Reinforcement learning, Two-player zero-sum games",

author = "Beno{\^i}t Fr{\'e}nay and Marco Saerens",

note = "M1 - 7-9",

year = "2008",

doi = "10.1016/j.neucom.2008.12.022",

language = "English",

volume = "72",

pages = "1494--1507",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier",

number = "7-9",

}

TY - JOUR

T1 - QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

AU - Frénay, Benoît

AU - Saerens, Marco

N1 - M1 - 7-9

PY - 2008

Y1 - 2008

N2 - Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994.) uses this framework to model two-agent zero-sum problems and, within this context, proposes the minimax-Q algorithm. This paper reviews RL algorithms for two-player zero-sum Markov games and introduces a new, simple, fast, algorithm, called QL2. QL2 is compared to several standard algorithms (Q-learning, Minimax and minimax-Q) implemented with the Q ash library written in Python. The experiments show that QL2 converges empirically to optimal mixed policies, as minimax-Q, but uses a surprisingly simple and cheap updating rule. © 2009 Elsevier B.V. All rights reserved.

AB - Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994.) uses this framework to model two-agent zero-sum problems and, within this context, proposes the minimax-Q algorithm. This paper reviews RL algorithms for two-player zero-sum Markov games and introduces a new, simple, fast, algorithm, called QL2. QL2 is compared to several standard algorithms (Q-learning, Minimax and minimax-Q) implemented with the Q ash library written in Python. The experiments show that QL2 converges empirically to optimal mixed policies, as minimax-Q, but uses a surprisingly simple and cheap updating rule. © 2009 Elsevier B.V. All rights reserved.

KW - Markov games

KW - Multi-agent

KW - Q-Learning

KW - Reinforcement learning

KW - Two-player zero-sum games

UR - http://www.scopus.com/inward/record.url?scp=62249147593&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2008.12.022

DO - 10.1016/j.neucom.2008.12.022

M3 - Article

SN - 0925-2312

VL - 72

SP - 1494

EP - 1507

JO - Neurocomputing

JF - Neurocomputing

IS - 7-9

ER -

QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this