2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor–critic proximal policy optimization (Actor–Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Adam: a method for stochastic 7 optimization

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Proximal Policy Optimization Algorithms

              We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.
                Bookmark

                Author and article information

                Journal
                Sensors (Basel)
                Sensors (Basel)
                sensors
                Sensors (Basel, Switzerland)
                MDPI
                1424-8220
                02 March 2020
                March 2020
                : 20
                : 5
                : 1359
                Affiliations
                [1 ]Department of Interdisciplinary Program in Creative Engineering, Korea University of Technology and Education, Cheonan 31253, Korea; glenn89@ 123456koreatech.ac.kr (H.-K.L.); chil1207@ 123456koreatech.ac.kr (J.-S.H.)
                [2 ]Department of Computer Science Engineering, Korea University of Technology and Education, Cheonan 31253, Korea; rlawnqhd@ 123456koreatech.ac.kr
                Author notes
                [* ]Correspondence: yhhan@ 123456koreatech.ac.kr
                Author information
                https://orcid.org/0000-0002-8807-1158
                https://orcid.org/0000-0002-5835-7972
                Article
                sensors-20-01359
                10.3390/s20051359
                7085801
                32121671
                9f9ef8f5-12fa-4056-99b7-153de4297d69
                © 2020 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 12 February 2020
                : 28 February 2020
                Categories
                Article

                Biomedical engineering
                actor–critic ppo,federated reinforcement learning,multi-device control

                Comments

                Comment on this article