Self-Attention-Based Deep Learning Network for Regional Influenza Forecasting

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

<p class="first" id="d8589128e59">Early prediction of influenza plays an important role in minimizing the damage caused, as it provides the resources and time needed to formulate preventive measures. Compared to traditional mechanistic approach, deep/machine learning-based models have demonstrated excellent forecasting performance by efficiently handling various data such as weather and internet data. However, due to the limited availability and reliability of such data, many forecasting models use only historical occurrence data and formulate the influenza forecasting as a multivariate time-series task. Recently, attention mechanisms have been exploited to deal with this issue by selecting valuable data in the input data and giving them high weights. Particularly, self-attention has shown its potential in various forecasting tasks by utilizing the predictive relationship between objects from the input data describing target objects. Hence, in this study, we propose a forecasting model based on self-attention for regional influenza forecasting, called SAIFlu-Net. The model exploits a long short-term memory network for extracting time-series patterns of each region and the self-attention mechanism to find the similarities between the occurrence patterns. To evaluate its performance, we conducted extensive experiments with existing forecasting models using weekly regional influenza datasets. The results show that the proposed model outperforms other models in terms of root mean square error and Pearson correlation coefficient. </p>

Related collections

Most cited references 45

Record: found
Abstract: found
Article: not found

Long Short-Term Memory

Jürgen Schmidhuber, Jürgen Schmidhuber (2002)

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

0 comments Cited 6728 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar … (2017)

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures