A systematic literature review of machine learning in online personal health data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Objective

User-generated content (UGC) in online environments provides opportunities to learn an individual’s health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations.

Materials and Methods

We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review.

Results

We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support.

Conclusions

The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.

Related collections

Most cited references 52

Record: found
Abstract: not found
Article: not found

Integration of Cloud computing and Internet of Things: A survey

Alessio Botta, Walter de Donato, Valerio Persico … (2016)

0 comments Cited 260 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Is Open Access

Utilizing social media data for pharmacovigilance: A review.

Abeed Sarker, Rachel E. Ginn, Azadeh Nikfarjam … (2015)

Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media.

0 comments Cited 150 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Social Media and Internet-Based Data in Global Systems for Public Health Surveillance: A Systematic Review

EDWARD VELASCO, TUMACHA AGHENEZA, KERSTIN DENECKE … (2014)

Context: The exchange of health information on the Internet has been heralded as an opportunity to improve public health surveillance. In a field that has traditionally relied on an established system of mandatory and voluntary reporting of known infectious diseases by doctors and laboratories to governmental agencies, innovations in social media and so-called user-generated information could lead to faster recognition of cases of infectious disease. More direct access to such data could enable surveillance epidemiologists to detect potential public health threats such as rare, new diseases or early-level warnings for epidemics. But how useful are data from social media and the Internet, and what is the potential to enhance surveillance? The challenges of using these emerging surveillance systems for infectious disease epidemiology, including the specific resources needed, technical requirements, and acceptability to public health practitioners and policymakers, have wide-reaching implications for public health surveillance in the 21st century. Methods: This article divides public health surveillance into indicator-based surveillance and event-based surveillance and provides an overview of each. We did an exhaustive review of published articles indexed in the databases PubMed, Scopus, and Scirus between 1990 and 2011 covering contemporary event-based systems for infectious disease surveillance. Findings: Our literature review uncovered no event-based surveillance systems currently used in national surveillance programs. While much has been done to develop event-based surveillance, the existing systems have limitations. Accordingly, there is a need for further development of automated technologies that monitor health-related information on the Internet, especially to handle large amounts of data and to prevent information overload. The dissemination to health authorities of new information about health events is not always efficient and could be improved. No comprehensive evaluations show whether event-based surveillance systems have been integrated into actual epidemiological work during real-time health events. Conclusions: The acceptability of data from the Internet and social media as a regular part of public health surveillance programs varies and is related to a circular challenge: the willingness to integrate is rooted in a lack of effectiveness studies, yet such effectiveness can be proved only through a structured evaluation of integrated systems. Issues related to changing technical and social paradigms in both individual perceptions of and interactions with personal health data, as well as social media and other data from the Internet, must be further addressed before such information can be integrated into official surveillance systems.

0 comments Cited 93 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Zhijun Yin: (View ORCID Profile)

Journal

Title: Journal of the American Medical Informatics Association

Publisher: Oxford University Press (OUP)

ISSN (Electronic): 1527-974X

Publication date Created: June 2019

Publication date Created: June 01 2019

Publication date Created: March 25 2019

Publication date Other: June 2019

Publication date (Print): June 01 2019

Publication date (Electronic): March 25 2019

Volume: 26

Issue: 6

Pages: 561-576

Affiliations

[1 ]Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA

[2 ]Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA

[3 ]Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA

Article

DOI: 10.1093/jamia/ocz009

PMC ID: 7647332

PubMed ID: 30908576

SO-VID: 3c85292b-504c-4f89-a27d-04c850e4cde1

License:

https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

A systematic literature review of machine learning in online personal health data

Read this article at

Abstract

Objective

Materials and Methods

Results

Conclusions

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 52

Integration of Cloud computing and Internet of Things: A survey

Utilizing social media data for pharmacovigilance: A review.

Social Media and Internet-Based Data in Global Systems for Public Health Surveillance: A Systematic Review

Author and article information

Contributors

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 165

Cited by 27

Most referenced authors 497