8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter

      research-article
      , PhD 1 , 2 , , , PhD 1 , , BEng 3 , , MD 4
      JAMA Network Open
      American Medical Association

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This cross-sectional study develops and validates a machine learning method for collecting and classifying data from opioid-related postings on a social media platform.

          Key Points

          Question

          Can natural language processing be used to gain real-time temporal and geospatial information from social media data about opioid abuse?

          Findings

          In this cross-sectional, population-based study of 9006 social media posts, supervised machine learning methods performed automatic 4-class classification of opioid-related social media chatter with a maximum F1 score of 0.726. Rates of automatically classified opioid abuse–indicating social media posts from Pennsylvania correlated with county-level overdose death rates and with 4 national survey metrics at the substate level.

          Meaning

          The findings suggest that automatic processing of social media data, combined with geospatial and temporal information, may provide close to real-time insights into the status and trajectory of the opioid epidemic.

          Abstract

          Importance

          Automatic curation of consumer-generated, opioid-related social media big data may enable real-time monitoring of the opioid epidemic in the United States.

          Objective

          To develop and validate an automatic text-processing pipeline for geospatial and temporal analysis of opioid-mentioning social media chatter.

          Design, Setting, and Participants

          This cross-sectional, population-based study was conducted from December 1, 2017, to August 31, 2019, and used more than 3 years of publicly available social media posts on Twitter, dated from January 1, 2012, to October 31, 2015, that were geolocated in Pennsylvania. Opioid-mentioning tweets were extracted using prescription and illicit opioid names, including street names and misspellings. Social media posts (tweets) (n = 9006) were manually categorized into 4 classes, and training and evaluation of several machine learning algorithms were performed. Temporal and geospatial patterns were analyzed with the best-performing classifier on unlabeled data.

          Main Outcomes and Measures

          Pearson and Spearman correlations of county- and substate-level abuse-indicating tweet rates with opioid overdose death rates from the Centers for Disease Control and Prevention WONDER database and with 4 metrics from the National Survey on Drug Use and Health for 3 years were calculated. Classifier performances were measured through microaveraged F1 scores (harmonic mean of precision and recall) or accuracies and 95% CIs.

          Results

          A total of 9006 social media posts were annotated, of which 1748 (19.4%) were related to abuse, 2001 (22.2%) were related to information, 4830 (53.6%) were unrelated, and 427 (4.7%) were not in the English language. Yearly rates of abuse-indicating social media post showed statistically significant correlation with county-level opioid-related overdose death rates (n = 75) for 3 years (Pearson r = 0.451, P < .001; Spearman r = 0.331, P = .004). Abuse-indicating tweet rates showed consistent correlations with 4 NSDUH metrics (n = 13) associated with nonmedical prescription opioid use (Pearson r = 0.683, P = .01; Spearman r = 0.346, P = .25), illicit drug use (Pearson r = 0.850, P < .001; Spearman r = 0.341, P = .25), illicit drug dependence (Pearson r = 0.937, P < .001; Spearman r = 0.495, P = .09), and illicit drug dependence or abuse (Pearson r = 0.935, P < .001; Spearman r = 0.401, P = .17) over the same 3-year period, although the tests lacked power to demonstrate statistical significance. A classification approach involving an ensemble of classifiers produced the best performance in accuracy or microaveraged F1 score (0.726; 95% CI, 0.708-0.743).

          Conclusions and Relevance

          The correlations obtained in this study suggest that a social media–based approach reliant on supervised machine learning may be suitable for geolocation-centric monitoring of the US opioid epidemic in near real time.

          Related collections

          Most cited references32

          • Record: found
          • Abstract: found
          • Article: found

          The Prescription Opioid and Heroin Crisis: A Public Health Approach to an Epidemic of Addiction

          Annual Review of Public Health, 36(1), 559-574
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Grounded Theory and Organizational Research

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found
              Is Open Access

              Utilizing social media data for pharmacovigilance: A review.

              Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media.
                Bookmark

                Author and article information

                Journal
                JAMA Netw Open
                JAMA Netw Open
                JAMA Netw Open
                JAMA Network Open
                American Medical Association
                2574-3805
                6 November 2019
                November 2019
                6 November 2019
                : 2
                : 11
                : e1914672
                Affiliations
                [1 ]Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia
                [2 ]Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia
                [3 ]School of Engineering and Applied Science, University of Pennsylvania, Philadelphia
                [4 ]Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia
                Author notes
                Article Information
                Accepted for Publication: August 4, 2019.
                Published: November 6, 2019. doi:10.1001/jamanetworkopen.2019.14672
                Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2019 Sarker A et al. JAMA Network Open.
                Corresponding Author: Abeed Sarker, PhD, Department of Biomedical Informatics, School of Medicine, Emory University, 101 Woodruff Circle, Office 4101, Atlanta, GA 30322 ( abeed@ 123456dbmi.emory.edu ).
                Author Contributions: Dr Sarker had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
                Concept and design: Sarker, Gonzalez-Hernandez, Perrone.
                Acquisition, analysis, or interpretation of data: Sarker, Ruan.
                Drafting of the manuscript: Sarker, Gonzalez-Hernandez.
                Critical revision of the manuscript for important intellectual content: All authors.
                Statistical analysis: Sarker, Ruan.
                Administrative, technical, or material support: Gonzalez-Hernandez.
                Supervision: Gonzalez-Hernandez, Perrone.
                Conflict of Interest Disclosures: Dr Sarker reported receiving grants from the National Institute on Drug Abuse (NIDA), grants from Pennsylvania Department of Health, and nonfinancial support from NVIDIA Corporation during the conduct of the study as well as personal fees from the National Board of Medical Examiners, grants from the Robert Wood Johnson Foundation, and honorarium from the National Institutes of Health (NIH) outside the submitted work. Dr Gonzalez-Hernandez reported receiving grants from NIH/NIDA during the conduct of the study and grants from AbbVie outside the submitted work. No other disclosures were reported.
                Funding/Support: This study was funded in part by award R01DA046619 from the NIH/NIDA. The data collection and annotation efforts were partly funded by a grant from the Pennsylvania Department of Health.
                Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
                Disclaimer: The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of NIDA or NIH.
                Additional Contributions: Karen O’Connor, MS, and Alexis Upshur, BS, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, and Annika DeRoos, College of Arts and Sciences, University of Pennsylvania, performed the annotations. Mss O’Connor and Upshur received compensation for their contributions as staff researchers, and Ms DeRoos received compensation as a sessional research assistant under the mentorship of Dr Sarker. The Titan Xp GPU used for the deep learning experiments was donated by the NVIDIA Corporation.
                Article
                zoi190564
                10.1001/jamanetworkopen.2019.14672
                6865282
                31693125
                8e6a07af-b965-44ad-b369-8b99f8957e6e
                Copyright 2019 Sarker A et al. JAMA Network Open.

                This is an open access article distributed under the terms of the CC-BY License.

                History
                : 10 June 2019
                : 14 September 2019
                Categories
                Research
                Original Investigation
                Online Only
                Health Informatics

                Comments

                Comment on this article