A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.

Related collections

Most cited references 69

Record: found
Abstract: found
Article: not found

Learning the parts of objects by non-negative matrix factorization.

D. Lee, H. Seung (1999)

Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

0 comments Cited 820 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Big Data, new epistemologies and paradigm shifts

D Kitchin (2014)

0 comments Cited 193 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation

Qiong Jia, Yue Guo, Stuart Barnes (2017)

0 comments Cited 144 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Roman Egger: URI : http://loop.frontiersin.org/people/1700113/overview

Joanne Yu: URI : http://loop.frontiersin.org/people/1558532/overview

Journal

Journal ID (nlm-ta): Front Sociol

Journal ID (iso-abbrev): Front Sociol

Journal ID (publisher-id): Front. Sociol.

Title: Frontiers in Sociology

Publisher: Frontiers Media S.A.

ISSN (Electronic): 2297-7775

Publication date (Electronic): 06 May 2022

Publication date Collection: 2022

Publication date PMC-release: 06 May 2022

Volume: 7

Electronic Location Identifier: 886498

Affiliations

[1] ¹Innovation and Management in Tourism, Salzburg University of Applied Sciences , Salzburg, Austria

[2] ²Department of Tourism and Service Management, Modul University Vienna , Vienna, Austria

Author notes

Edited by: Dimitri Prandner, Johannes Kepler University of Linz, Austria

Reviewed by: Tobias Wolbring, University of Erlangen Nuremberg, Germany; Ruben Bach, University of Mannheim, Germany

*Correspondence: Joanne Yu joanne.yu@ 123456modul.ac.at

This article was submitted to Sociological Theory, a section of the journal Frontiers in Sociology

Article

DOI: 10.3389/fsoc.2022.886498

PMC ID: 9120935

PubMed ID: 35602001

SO-VID: 5f2df5e4-e351-42b2-9112-240de5c07f98

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 28 February 2022

Date accepted : 19 April 2022

Page count

Figures: 5, Tables: 5, Equations: 0, References: 74, Pages: 16, Words: 11609

Comments

Comment on this article

scite_

Cited by 38

See all cited by

Most referenced authors 517

See all reference authors

A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts

Read this article at

Abstract

Related collections

The Science of Twitter

Most cited references 69

Learning the parts of objects by non-negative matrix factorization.

Big Data, new epistemologies and paradigm shifts

Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 447

Cited by 38

Most referenced authors 517