Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor–critic proximal policy optimization (Actor–Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.

Related collections

Most cited references 29

Record: found
Abstract: not found
Conference Proceedings: not found

Adam: a method for stochastic 7 optimization

D. P. Kingma, J. Ba, J. L. Ba … (2025)

0 comments Cited 239 times – based on 0 reviews

Bookmark

Record: found
Abstract: not found
Article: not found

Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

Boyi Liu, Lujia Wang, Ming Liu (2019)

0 comments Cited 30 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Oleg Klimov … (2017)

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.

0 comments Cited 29 times – based on 0 reviews

Preprint

     Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Sensors (Basel)

Journal ID (iso-abbrev): Sensors (Basel)

Journal ID (publisher-id): sensors

Title: Sensors (Basel, Switzerland)

Publisher: MDPI

ISSN (Electronic): 1424-8220

Publication date (Electronic): 02 March 2020

Publication date Collection: March 2020

Volume: 20

Issue: 5

Electronic Location Identifier: 1359

Affiliations

[1 ]Department of Interdisciplinary Program in Creative Engineering, Korea University of Technology and Education, Cheonan 31253, Korea; glenn89@ 123456koreatech.ac.kr (H.-K.L.); chil1207@ 123456koreatech.ac.kr (J.-S.H.)

[2 ]Department of Computer Science Engineering, Korea University of Technology and Education, Cheonan 31253, Korea; rlawnqhd@ 123456koreatech.ac.kr

Author notes

[* ]Correspondence: yhhan@ 123456koreatech.ac.kr

Author information

Hyun-Kyo Lim https://orcid.org/0000-0002-8807-1158

Youn-Hee Han https://orcid.org/0000-0002-5835-7972

Article

Publisher ID: sensors-20-01359

DOI: 10.3390/s20051359

PMC ID: 7085801

PubMed ID: 32121671

SO-VID: 9f9ef8f5-12fa-4056-99b7-153de4297d69

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

History

Date received : 12 February 2020

Date accepted : 28 February 2020

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 13

See all cited by

Most referenced authors 289

See all reference authors

Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices

Read this article at

Abstract

Related collections

Computer Vision, Deep Learning, Deep Reinforcement Learning, IoT

Most cited references 29

Adam: a method for stochastic 7 optimization

Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

Proximal Policy Optimization Algorithms

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 209

Cited by 13

Most referenced authors 289