Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods—they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.

Related collections

Most cited references 10

Record: found
Abstract: not found
Conference Proceedings: not found

Thumbs up?

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan (2002)

0 comments Cited 409 times – based on 0 reviews

Bookmark

Record: found
Abstract: not found
Article: not found

The Political Economy of Benefits and Costs: A Neoclassical Approach to Distributive Politics

Barry Weingast, Kenneth A Shepsle, Christopher Johnsen (1981)

0 comments Cited 186 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

Burt Monroe, Michael P. Colaresi, Kevin M. Quinn (2008)

Entries in the burgeoning “text-as-data” movement are often accompanied by lists or visualizations of how word (or other lexical feature) usage differs across some pair or set of documents. These are intended either to establish some target semantic concept (like the content of partisan frames) to estimate word-specific measures that feed forward into another analysis (like locating parties in ideological space) or both. We discuss a variety of techniques for selecting words that capture partisan, or other, differences in political speech and for evaluating the relative importance of those words. We introduce and emphasize several new approaches based on Bayesian shrinkage and regularization. We illustrate the relative utility of these approaches with analyses of partisan, gender, and distributive speech in the U.S. Senate.

0 comments Cited 79 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (publisher-id): applab

Title: Political Analysis

Abbreviated Title: Polit. anal.

Publisher: Oxford University Press (OUP)

ISSN (Print): 1047-1987

ISSN (Electronic): 1476-4989

Publication date Created: 2013

Publication date (Print): January 2017

Volume: 21

Issue: 03

Pages: 267-297

Article

DOI: 10.1093/pan/mps028

SO-VID: 8ea24ee7-a407-4aa5-95fd-10f4b078d465

History

Data availability:

Comments

Comment on this article

scite_

Cited by 526

See all cited by

Most referenced authors 118

See all reference authors

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

Read this article at

Abstract

Related collections

NeuroImaging Methods

Most cited references 10

Thumbs up?

The Political Economy of Benefits and Costs: A Neoclassical Approach to Distributive Politics

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 270

Cited by 526

Most referenced authors 118