3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Multirobot Collaborative Pursuit Target Robot by Improved MADDPG

      research-article
      1 , 1 , 1 , 2 ,
      Computational Intelligence and Neuroscience
      Hindawi

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Policy formulation is one of the main problems in multirobot systems, especially in multirobot pursuit-evasion scenarios, where both sparse rewards and random environment changes bring great difficulties to find better strategy. Existing multirobot decision-making methods mostly use environmental rewards to promote robots to complete the target task that cannot achieve good results. This paper proposes a multirobot pursuit method based on improved multiagent deep deterministic policy gradient (MADDPG), which solves the problem of sparse rewards in multirobot pursuit-evasion scenarios by combining the intrinsic reward and the external environment. The state similarity module based on the threshold constraint is as a part of the intrinsic reward signal output by the intrinsic curiosity module, which is used to balance overexploration and insufficient exploration, so that the agent can use the intrinsic reward more effectively to learn better strategies. The simulation experiment results show that the proposed method can improve the reward value of robots and the success rate of the pursuit task significantly. The intuitive change is obviously reflected in the real-time distance between the pursuer and the escapee, the pursuer using the improved algorithm for training can get closer to the escapee more quickly, and the average following distance also decreases.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: not found

          Human-level control through deep reinforcement learning.

          The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Curiosity-Driven Exploration by Self-Supervised Prediction

              Bookmark
              • Record: found
              • Abstract: not found
              • Book: not found

              Reinforcement learning: An introduction

                Bookmark

                Author and article information

                Contributors
                Journal
                Comput Intell Neurosci
                Comput Intell Neurosci
                cin
                Computational Intelligence and Neuroscience
                Hindawi
                1687-5265
                1687-5273
                2022
                25 February 2022
                : 2022
                : 4757394
                Affiliations
                1School of Mechanical and Electronic Engineering, Wuhan University of Technology, Wuhan 430070, China
                2Intelligent Transport Systems Research Center, Wuhan University of Technology, Wuhan 430063, China
                Author notes

                Academic Editor: Daqing Gong

                Author information
                https://orcid.org/0000-0002-3591-0982
                https://orcid.org/0000-0002-6453-3805
                https://orcid.org/0000-0002-1265-8112
                https://orcid.org/0000-0002-3005-6772
                Article
                10.1155/2022/4757394
                8896963
                35251150
                efa0b5e4-abb5-4f74-bea1-212ec37ca3ce
                Copyright © 2022 Xiao Zhou et al.

                This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 19 January 2022
                : 8 February 2022
                Funding
                Funded by: Research and Development
                Award ID: 2021YFC3001502
                Funded by: National Natural Science Foundation of China
                Award ID: 52072292
                Categories
                Research Article

                Neurosciences
                Neurosciences

                Comments

                Comment on this article