Treating words as data with error: Uncertainty in text statements of policy positions

Political text offers extraordinary potential as a source of information about the policy positions of political actors. Despite recent advances in computational text analysis, human interpretative coding of text remains an important source of text-based data, ultimately required to validate more au...

Full description

Saved in:

Bibliographic Details
Main Authors:	BENOIT, Kenneth, LAVER, Michael, MIKHAYLOV, Slava
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2009
Subjects:	Models and Methods Political Science
Online Access:	https://ink.library.smu.edu.sg/soss_research/3990 https://ink.library.smu.edu.sg/context/soss_research/article/5248/viewcontent/blm2009ajps_pv.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.soss_research-5248
record_format	dspace
spelling	sg-smu-ink.soss_research-52482024-09-02T06:15:39Z Treating words as data with error: Uncertainty in text statements of policy positions BENOIT, Kenneth LAVER, Michael MIKHAYLOV, Slava Political text offers extraordinary potential as a source of information about the policy positions of political actors. Despite recent advances in computational text analysis, human interpretative coding of text remains an important source of text-based data, ultimately required to validate more automatic techniques. The profession's main source of cross-national, time-series data on party policy positions comes from the human interpretative coding of party manifestos by the Comparative Manifesto Project (CMP). Despite widespread use of these data, the uncertainty associated with each point estimate has never been available, undermining the value of the dataset as a scientific resource. We propose a remedy. First, we characterize processes by which CMP data are generated. These include inherently stochastic processes of text authorship, as well as of the parsing and coding of observed text by humans. Second, we simulate these error-generating processes by bootstrapping analyses of coded quasi-sentences. This allows us to estimate precise levels of nonsystematic error for every category and scale reported by the CMP for its entire set of 3,000-plus manifestos. Using our estimates of these errors, we show how to correct biased inferences, in recent prominently published work, derived from statistical analyses of error-contaminated CMP data. 2009-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/soss_research/3990 info:doi/10.1111/j.1540-5907.2009.00383.x https://ink.library.smu.edu.sg/context/soss_research/article/5248/viewcontent/blm2009ajps_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School of Social Sciences eng Institutional Knowledge at Singapore Management University Models and Methods Political Science
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Models and Methods Political Science
spellingShingle	Models and Methods Political Science BENOIT, Kenneth LAVER, Michael MIKHAYLOV, Slava Treating words as data with error: Uncertainty in text statements of policy positions
description	Political text offers extraordinary potential as a source of information about the policy positions of political actors. Despite recent advances in computational text analysis, human interpretative coding of text remains an important source of text-based data, ultimately required to validate more automatic techniques. The profession's main source of cross-national, time-series data on party policy positions comes from the human interpretative coding of party manifestos by the Comparative Manifesto Project (CMP). Despite widespread use of these data, the uncertainty associated with each point estimate has never been available, undermining the value of the dataset as a scientific resource. We propose a remedy. First, we characterize processes by which CMP data are generated. These include inherently stochastic processes of text authorship, as well as of the parsing and coding of observed text by humans. Second, we simulate these error-generating processes by bootstrapping analyses of coded quasi-sentences. This allows us to estimate precise levels of nonsystematic error for every category and scale reported by the CMP for its entire set of 3,000-plus manifestos. Using our estimates of these errors, we show how to correct biased inferences, in recent prominently published work, derived from statistical analyses of error-contaminated CMP data.
format	text
author	BENOIT, Kenneth LAVER, Michael MIKHAYLOV, Slava
author_facet	BENOIT, Kenneth LAVER, Michael MIKHAYLOV, Slava
author_sort	BENOIT, Kenneth
title	Treating words as data with error: Uncertainty in text statements of policy positions
title_short	Treating words as data with error: Uncertainty in text statements of policy positions
title_full	Treating words as data with error: Uncertainty in text statements of policy positions
title_fullStr	Treating words as data with error: Uncertainty in text statements of policy positions
title_full_unstemmed	Treating words as data with error: Uncertainty in text statements of policy positions
title_sort	treating words as data with error: uncertainty in text statements of policy positions
publisher	Institutional Knowledge at Singapore Management University
publishDate	2009
url	https://ink.library.smu.edu.sg/soss_research/3990 https://ink.library.smu.edu.sg/context/soss_research/article/5248/viewcontent/blm2009ajps_pv.pdf
_version_	1814047855281700864

Treating words as data with error: Uncertainty in text statements of policy positions

Similar Items