Pathfinding in stochastic environments: learning <i>vs</i> planning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Among the main challenges associated with navigating a mobile robot in complex environments are partial observability and stochasticity. This work proposes a stochastic formulation of the pathfinding problem, assuming that obstacles of arbitrary shapes may appear and disappear at random moments of time. Moreover, we consider the case when the environment is only partially observable for an agent. We study and evaluate two orthogonal approaches to tackle the problem of reaching the goal under such conditions: planning and learning. Within planning, an agent constantly re-plans and updates the path based on the history of the observations using a search-based planner. Within learning, an agent asynchronously learns to optimize a policy function using recurrent neural networks (we propose an original efficient, scalable approach). We carry on an extensive empirical evaluation of both approaches that show that the learning-based approach scales better to the increasing number of the unpredictably appearing/disappearing obstacles. At the same time, the planning-based one is preferable when the environment is close-to-the-deterministic ( i.e., external disturbances are rare). Code available at https://github.com/Tviskaron/pathfinding-in-stochastic-envs.

Related collections

Most cited references 38

Record: found
Abstract: found
Article: not found

Human-level control through deep reinforcement learning.

Volodymyr Mnih, Koray Kavukcuoglu, David Silver … (2015)

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

0 comments Cited 1803 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

A Formal Basis for the Heuristic Determination of Minimum Cost Paths

Peter Hart, Nils Nilsson, Bertram Raphael (1968)

0 comments Cited 373 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Simultaneous Localization and Mapping: A Survey of Current Trends in Autonomous Driving

Guillaume Bresson, Zayed Alsayed, Li Yu … (2017)

0 comments Cited 88 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Alexey Skrynnik

Journal

Journal ID (nlm-ta): PeerJ Comput Sci

Journal ID (iso-abbrev): PeerJ Comput Sci

Journal ID (publisher-id): peerj-cs

Title: PeerJ Computer Science

Publisher: PeerJ Inc. (San Diego, USA )

ISSN (Electronic): 2376-5992

Publication date (Electronic): 18 August 2022

Publication date Collection: 2022

Volume: 8

Electronic Location Identifier: e1056

Affiliations

[1 ]Cognitive Dynamic Systems, Moscow Institute of Physics and Technology , Moscow, Russia

[2 ]Artificial Intelligence Research Institute AIRI , Moscow, Russia

[3 ]Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences , Moscow, Russia

Article

Publisher ID: cs-1056

DOI: 10.7717/peerj-cs.1056

PMC ID: 9455045

PubMed ID: 36091975

SO-VID: 86975556-39b9-450c-93ec-43de3cfcaf5d

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

History

Date received : 5 May 2022

Date accepted : 14 July 2022

Funding

Funded by: The Ministry of Science and Higher Education of the Russian Federation under Project 075-15-2020-799

This work was supported by the Ministry of Science and Higher Education of the Russian Federation under Project 075-15-2020-799. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Most referenced authors 405

See all reference authors

Pathfinding in stochastic environments: learning vs planning

Read this article at

Abstract

Related collections

Computer Vision, Deep Learning, Deep Reinforcement Learning, IoT

Most cited references 38

Human-level control through deep reinforcement learning.

A Formal Basis for the Heuristic Determination of Minimum Cost Paths

Simultaneous Localization and Mapping: A Survey of Current Trends in Autonomous Driving

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 412

Most referenced authors 405