Evaluating automatic annotation of lexicon-based models for stance detection of M-pox tweets from May 1st to Sep 5th, 2022

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Manually labeling data for supervised learning is time and energy consuming; therefore, lexicon-based models such as VADER and TextBlob are used to automatically label data. However, it is argued that automated labels do not have the accuracy required for training an efficient model. Although automated labeling is frequently used for stance detection, automated stance labels have not been properly evaluated, in the previous works. In this work, to assess the accuracy of VADER and TextBlob automated labels for stance analysis, we first manually label a Twitter, now X, dataset related to M-pox stance detection. We then fine-tune different transformer-based models on the hand-labeled M-pox dataset, and compare their accuracy before and after fine-tuning, with the accuracy of automated labeled data. Our results indicated that the fine-tuned models surpassed the accuracy of VADER and TextBlob automated labels by up to 38% and 72.5%, respectively. Topic modeling further shows that fine-tuning diminished the scope of misclassified tweets to specific sub-topics. We conclude that fine-tuning transformer models on hand-labeled data for stance detection, elevates the accuracy to a superior level that is significantly higher than automated stance detection labels. This study verifies that automated stance detection labels are not reliable for sensitive use-cases such as health-related purposes. Manually labeled data is more convenient for developing Natural Language Processing (NLP) models that study and analyze mass opinions and conversations on social media platforms, during crises such as pandemics and epidemics.

Author summary

Social media platforms are pivotal in shaping public opinion during health crises, influencing policy-making and crisis management. Challenges such as labor-intensive manual labeling and dataset biases highlight the need for optimized stance detection methods. Our study assessed VADER and TextBlob for stance detection during the M-pox outbreak on social media, comparing their automated labels with our manually labeled data. Transformer-based models consistently outperformed lexicon-based approaches, showing significant improvements both before and after fine-tuning. Specifically, models pre-trained on the COVID-19 tweets demonstrated over a 20% enhancement in accurately classifying M-pox tweets. Through topic modeling of misclassified tweets, nuanced sub-topics in M-pox discussions were identified, highlighting the value of integrating multi-modal data and using hand-labeled datasets for comprehensive sentiment analysis across platforms and contexts. Policymakers and healthcare authorities can utilize these insights to craft precise communication strategies, combat misinformation, and address public concerns effectively. Advancements in machine learning for health-related stance detection hold promise for optimizing crisis management and informing evidence-based policy-making during emerging epidemics and pandemics, with implications for future research and policy development.

Related collections

Most cited references 24

Record: found
Abstract: found
Article: not found

VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text

C Hutto, Eric Gilbert (2014)

The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.

0 comments Cited 230 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Is Open Access

Pandemic Prevention: Lessons from COVID-19

Mario Coccia (2021)

Coronavirus disease 2019 (COVID-19) is caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which appeared in late 2019, generating a pandemic crisis with high numbers of COVID-19-related infected individuals and deaths in manifold countries worldwide. Lessons learned from COVID-19 can be used to prevent pandemic threats by designing strategies to support different policy responses, not limited to the health system, directed to reduce the risks of the emergence of novel viral agents, the diffusion of infectious diseases and negative impact in society.

0 comments Cited 91 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Infodemics and health misinformation: a systematic review of reviews

Israel Júnior Borges do Nascimento, Ana Beatriz Pizarro, Jussara Almeida … (2022)

Abstract Objective To compare and summarize the literature regarding infodemics and health misinformation, and to identify challenges and opportunities for addressing the issues of infodemics. Methods We searched MEDLINE®, Embase®, Cochrane Library of Systematic Reviews, Scopus and Epistemonikos on 6 May 2022 for systematic reviews analysing infodemics, misinformation, disinformation and fake news related to health. We grouped studies based on similarity and retrieved evidence on challenges and opportunities. We used the AMSTAR 2 approach to assess the reviews’ methodological quality. To evaluate the quality of the evidence, we used the Grading of Recommendations Assessment, Development and Evaluation guidelines. Findings Our search identified 31 systematic reviews, of which 17 were published. The proportion of health-related misinformation on social media ranged from 0.2% to 28.8%. Twitter, Facebook, YouTube and Instagram are critical in disseminating the rapid and far-reaching information. The most negative consequences of health misinformation are the increase of misleading or incorrect interpretations of available evidence, impact on mental health, misallocation of health resources and an increase in vaccination hesitancy. The increase of unreliable health information delays care provision and increases the occurrence of hateful and divisive rhetoric. Social media could also be a useful tool to combat misinformation during crises. Included reviews highlight the poor quality of published studies during health crises. Conclusion Available evidence suggests that infodemics during health emergencies have an adverse effect on society. Multisectoral actions to counteract infodemics and health misinformation are needed, including developing legal policies, creating and promoting awareness campaigns, improving health-related content in mass media and increasing people’s digital and health literacy.

0 comments Cited 52 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Nicholas Perikli:

ORCID: https://orcid.org/0000-0002-8963-4290

Role: ConceptualizationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing

Srimoy Bhattacharya: Role: ConceptualizationRole: Formal analysisRole: ValidationRole: Writing – original draft

Blessing Ogbuokiri: Role: Project administrationRole: ResourcesRole: Software

Zahra Movahedi Nia: Role: Data curationRole: Formal analysisRole: MethodologyRole: Resources

Benjamin Lieberman: Role: ConceptualizationRole: Funding acquisitionRole: InvestigationRole: Project administrationRole: ResourcesRole: SoftwareRole: Visualization

Nidhi Tripathi: Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: ValidationRole: Visualization

Salah-Eddine Dahbi: Role: ConceptualizationRole: Data curationRole: InvestigationRole: Project administrationRole: SoftwareRole: ValidationRole: Writing – review & editing

Finn Stevenson: Role: ConceptualizationRole: InvestigationRole: MethodologyRole: ResourcesRole: SoftwareRole: SupervisionRole: ValidationRole: Visualization

Nicola Bragazzi: Role: Formal analysisRole: InvestigationRole: ResourcesRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – review & editing

Jude Kong:

ORCID: https://orcid.org/0000-0002-7557-5672

Role: ConceptualizationRole: InvestigationRole: MethodologyRole: ResourcesRole: SoftwareRole: ValidationRole: Visualization

Bruce Mellado: Role: ConceptualizationRole: Data curationRole: Formal analysisRole: MethodologyRole: ResourcesRole: ValidationRole: VisualizationRole: Writing – review & editing

Frank Rudzicz: Role: Editor

Journal

Journal ID (nlm-ta): PLOS Digit Health

Journal ID (iso-abbrev): PLOS Digit Health

Journal ID (publisher-id): plos

Title: PLOS Digital Health

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Electronic): 2767-3170

Publication date Collection: July 2024

Publication date (Electronic): 30 July 2024

Volume: 3

Issue: 7

Electronic Location Identifier: e0000545

Affiliations

[1 ] School of Physics and Institute for Collider Particle Physics, University of the Witwatersrand, Johannesburg, South Africa

[2 ] iThemba LABS, National Research Foundation, Cape Town, South Africa

[3 ] Africa-Canada Artificial Intelligence and Data Innovation Consortium (ACADIC), York University, Toronto, Canada

[4 ] Laboratory for Industrial and Applied Mathematics, York University, Toronto, Canada

[5 ] Department of Computer Science, Brock University, St. Catharines, Niagara Region, Ontorio, Canada

[6 ] Department of Mathematics, Bahen Center for Information Technology, University of Toronto, Canada

[7 ] Global South Artificial Intelligence for Pandemic Preparedness and Response Network (AI4PEP), York University, Toronto, Canada

[8 ] Artificial Intelligence & Mathematical Modeling Lab (AIMMLab), Dala Lana School of Public Health, University of Toronto, Canada

Dalhousie University Faculty of Computer Science, CANADA

Author notes

The authors have declared that no competing interests exist.

* E-mail: 718730@ 123456students.wits.ac.za

Author information

Nicholas Perikli https://orcid.org/0000-0002-8963-4290

Jude Kong https://orcid.org/0000-0002-7557-5672

Article

Publisher ID: PDIG-D-23-00389

DOI: 10.1371/journal.pdig.0000545

PMC ID: 11288444

PubMed ID: 39078813

SO-VID: 42978ded-dd1e-43a1-b713-af426f774a33

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 23 October 2023

Date accepted : 3 June 2024

Page count

Figures: 2, Tables: 5, Pages: 14

Funding

Funded by: funder-id http://dx.doi.org/10.13039/501100000193, International Development Research Centre;

Award Recipient :

ORCID: https://orcid.org/0000-0002-7557-5672

Jude Kong

Funded by: Swedish International Development Cooperation Agency (SIDA)

Award ID: 109559-001

Award Recipient :

ORCID: https://orcid.org/0000-0002-7557-5672

Jude Kong

JDK acknowledges both Canada’s International Development Research Centre (IDRC), and the Swedish International Development Cooperation Agency (SIDA) (Grant No. 109559-001) for funding this research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Custom metadata

Data Availability Due to Twitter’s developers’ policy, only Tweet IDs can be shared with public. All our data are available as a supplementary file to this manuscript ( S1 File).

Outbreaks COVID-19

Data availability:

Evaluating automatic annotation of lexicon-based models for stance detection of M-pox tweets from May 1st to Sep 5th, 2022

Read this article at

Abstract

Author summary

Related collections

Recursive Rule based Visual Categorization

Most cited references 24

VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text

Pandemic Prevention: Lessons from COVID-19

Infodemics and health misinformation: a systematic review of reviews

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 254

Most referenced authors 150