Policy Iteration for Pareto-Optimal Policies in Stochastic Stackelberg Games

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In general-sum stochastic games, a stationary Stackelberg equilibrium (SSE) does not always exist, in which the leader maximizes leader's return for all the initial states when the follower takes the best response against the leader's policy. Existing methods of determining the SSEs require strong assumptions to guarantee the convergence and the coincidence of the limit with the SSE. Moreover, our analysis suggests that the performance at the fixed points of these methods is not reasonable when they are not SSEs. Herein, we introduced the concept of Pareto-optimality as a reasonable alternative to SSEs. We derive the policy improvement theorem for stochastic games with the best-response follower and propose an iterative algorithm to determine the Pareto-optimal policies based on it. Monotone improvement and convergence of the proposed approach are proved, and its convergence to SSEs is proved in a special case.

Related collections

Author and article information

Journal

Publication date Created: 07 May 2024

Article

ArXiV ID: 2405.06689

SO-VID: 340bb6f7-fa49-4685-9626-afff7208861d

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments 21 pages

Categories cs.GT cs.LG cs.MA math.OC

ScienceOpen disciplines: Numerical methods,Theoretical computer science,Artificial intelligence

Data availability:

ScienceOpen disciplines: Numerical methods, Theoretical computer science, Artificial intelligence

Policy Iteration for Pareto-Optimal Policies in Stochastic Stackelberg Games

Read this article at

Abstract

Related collections

Policy Perspectives

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 492