55
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Learning to Forget: Continual Prediction with LSTM

      , ,
      Neural Computation
      MIT Press - Journals

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel, adaptive "forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them, and in an elegant way.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: not found
          • Article: not found

          Generalization of backpropagation with application to a recurrent gas market model

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Learning long-term dependencies in NARX recurrent neural networks.

            It has previously been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long-term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities. We have previously reported that gradient descent learning can be more effective in NARX networks than in recurrent neural network architectures that have "hidden states" on problems including grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other networks. The results in this paper are consistent with this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventional recurrent neural networks. We show that although NARX networks do not circumvent the problem of long-term dependencies, they can greatly improve performance on long-term dependency problems. We also describe in detail some of the assumptions regarding what it means to latch information robustly and suggest possible ways to loosen these assumptions.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories

                Bookmark

                Author and article information

                Journal
                Neural Computation
                Neural Computation
                MIT Press - Journals
                0899-7667
                1530-888X
                October 2000
                October 2000
                : 12
                : 10
                : 2451-2471
                Article
                10.1162/089976600300015015
                11032042
                2fc1cf21-235f-450b-833c-4c64c8dc8db6
                © 2000
                History

                Comments

                Comment on this article