15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Egocentric action anticipation consists in understanding which objects the camera wearer will interact with in the near future and which actions they will perform. We tackle the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to 1) summarize the past, and 2) formulate predictions about the future. The input video is processed considering three complimentary modalities: appearance (RGB), motion (optical flow) and objects (object-based features). Modality-specific predictions are fused using a novel Modality ATTention (MATT) mechanism which learns to weigh modalities in an adaptive fashion. Extensive evaluations on two large-scale benchmark datasets show that our method outperforms prior art by up to +7% on the challenging EPIC-KITCHENS dataset including more than 2500 actions, and generalizes to EGTEA Gaze+. Our approach is also shown to generalize to the tasks of early action recognition and action recognition. At the moment of submission, our method is ranked first in the leaderboard of the EPIC-KITCHENS egocentric action anticipation challenge.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Learning realistic human actions from movies

            Bookmark
            • Record: found
            • Abstract: not found
            • Book Chapter: not found

            Activity Forecasting

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Detecting activities of daily living in first-person camera views

                Bookmark

                Author and article information

                Journal
                22 May 2019
                Article
                1905.09035
                fcbf8ba3-7542-42ba-8138-3dbd7b654a54

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                See project page at http://iplab.dmi.unict.it/rulstm/
                cs.CV cs.AI

                Computer vision & Pattern recognition,Artificial intelligence
                Computer vision & Pattern recognition, Artificial intelligence

                Comments

                Comment on this article