Hybrid Online and Offline Reinforcement Learning for Tibetan Jiu Chess

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In this study, hybrid state-action-reward-state-action (SARSA $(λ)$ ) and Q-learning algorithms are applied to different stages of an upper confidence bound applied to tree search for Tibetan Jiu chess. Q-learning is also used to update all the nodes on the search path when each game ends. A learning strategy that uses SARSA $(λ)$ and Q-learning algorithms combining domain knowledge for a feedback function for layout and battle stages is proposed. An improved deep neural network based on ResNet18 is used for self-play training. Experimental results show that hybrid online and offline reinforcement learning with a deep neural network can improve the game program’s learning efficiency and understanding ability for Tibetan Jiu chess.

Related collections

Most cited references 17

Record: found
Abstract: not found
Article: not found

Learning to predict by the methods of temporal differences

Richard S. Sutton (1988)

0 comments Cited 450 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Deep Blue

Murray Campbell, A.Joseph Hoane, Feng-hsiung Hsu (2002)

0 comments Cited 190 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Superhuman AI for heads-up no-limit poker: Libratus beats top professionals

Tuomas Sandholm, Noam Brown (2018)

No-limit Texas hold’em is the most popular form of poker. Despite AI successes in perfect-information games, the private information and massive game tree have made no-limit poker difficult to tackle. We present Libratus, an AI that, in a 120,000-hand competition, defeated four top human specialist professionals in heads-up no-limit Texas hold’em, the leading benchmark and long-standing challenge problem in imperfect-information game solving. Our game-theoretic approach features application-independent techniques: an algorithm for computing a blueprint for the overall strategy, an algorithm that fleshes out the details of the strategy for subgames that are reached during play, and a self-improver algorithm that fixes potential weaknesses that opponents have identified in the blueprint strategy.

0 comments Cited 87 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Complexity

Abbreviated Title: Complexity

Publisher: Hindawi Limited

ISSN (Print): 1076-2787

ISSN (Electronic): 1099-0526

Publication date Created: May 11 2020

Publication date (Print): May 11 2020

Volume: 2020

Pages: 1-11

Affiliations

[1 ]School of Information and Engineering, Minzu University of China, Beijing 100081, China

Article

DOI: 10.1155/2020/4708075

SO-VID: de514eb9-749f-4804-837f-63eeaec37378

License:

http://creativecommons.org/licenses/by/4.0/

History

Data availability:

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Most referenced authors 181

See all reference authors

Hybrid Online and Offline Reinforcement Learning for Tibetan Jiu Chess

Read this article at

Abstract

Related collections

Computer Vision, Deep Learning, Deep Reinforcement Learning, IoT

Most cited references 17

Learning to predict by the methods of temporal differences

Deep Blue

Superhuman AI for heads-up no-limit poker: Libratus beats top professionals

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 583

Most referenced authors 181