0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Self-Attention-Based Deep Learning Network for Regional Influenza Forecasting

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          <p class="first" id="d8589128e59">Early prediction of influenza plays an important role in minimizing the damage caused, as it provides the resources and time needed to formulate preventive measures. Compared to traditional mechanistic approach, deep/machine learning-based models have demonstrated excellent forecasting performance by efficiently handling various data such as weather and internet data. However, due to the limited availability and reliability of such data, many forecasting models use only historical occurrence data and formulate the influenza forecasting as a multivariate time-series task. Recently, attention mechanisms have been exploited to deal with this issue by selecting valuable data in the input data and giving them high weights. Particularly, self-attention has shown its potential in various forecasting tasks by utilizing the predictive relationship between objects from the input data describing target objects. Hence, in this study, we propose a forecasting model based on self-attention for regional influenza forecasting, called SAIFlu-Net. The model exploits a long short-term memory network for extracting time-series patterns of each region and the self-attention mechanism to find the similarities between the occurrence patterns. To evaluate its performance, we conducted extensive experiments with existing forecasting models using weekly regional influenza datasets. The results show that the proposed model outperforms other models in terms of root mean square error and Pearson correlation coefficient. </p>

          Related collections

          Most cited references45

          • Record: found
          • Abstract: found
          • Article: not found

          Long Short-Term Memory

          Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Attention Is All You Need

            The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Dropout: a simple way to prevent neural networks from overfitting

                Bookmark

                Author and article information

                Contributors
                Journal
                IEEE Journal of Biomedical and Health Informatics
                IEEE J. Biomed. Health Inform.
                Institute of Electrical and Electronics Engineers (IEEE)
                2168-2194
                2168-2208
                February 2022
                February 2022
                : 26
                : 2
                : 922-933
                Article
                10.1109/JBHI.2021.3093897
                34197330
                b1a1ccec-ed40-454f-ad73-780ce00bb0d6
                © 2022

                https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html

                https://doi.org/10.15223/policy-029

                https://doi.org/10.15223/policy-037

                History

                Comments

                Comment on this article