Natural sentences as valid units for coded political texts

A rapidly growing area in political science has focused on perfecting techniques to treat politicaltext as ‘data’, usually for the purposes of estimating latent traits such as left–right political policypositions.1 More traditional approaches have applied classical content analysis to categorize sub...

Full description

Saved in:
Bibliographic Details
Main Authors: DAUBLER, Thomas, BENOIT, Kenneth, MIKHAYLOV, Slava, LAVER, Michael
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2012
Subjects:
Online Access:https://ink.library.smu.edu.sg/soss_research/3972
https://ink.library.smu.edu.sg/context/soss_research/article/5230/viewcontent/Daubler_etal_2012_pv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.soss_research-5230
record_format dspace
spelling sg-smu-ink.soss_research-52302024-09-02T06:29:54Z Natural sentences as valid units for coded political texts DAUBLER, Thomas BENOIT, Kenneth MIKHAYLOV, Slava LAVER, Michael A rapidly growing area in political science has focused on perfecting techniques to treat politicaltext as ‘data’, usually for the purposes of estimating latent traits such as left–right political policypositions.1 More traditional approaches have applied classical content analysis to categorize sub-unitsof political text, such as sentences in manifestos. Prominent examples of this latter approach includethe thirty-year old Comparative Manifestos Project and the Policy Agendas Project.2 ‘Text as data’approaches use machines to convert text to quantitative information and use statistical tools to makeinferences about characteristics of the author of the text. Content analysis schemes use humans to readtextual sub-units and assign these to pre-defined categories. Both methods require the prior identificationof a textual unit of analysis – a highly consequential, yet often unquestioned, feature of research design.Our objective in this Research Note is to question the dominant approach to unitizing politicaltexts prior to human coding. This is to parse texts into quasi-sentences (QSs), where a QS is definedas part or all of a natural sentence that states a distinct policy proposition. The use of the QS ratherthan a natural language unit (such as a sentence defined by punctuation) is motivated by the desireto capture all relevant political information, regardless of the stylistic decisions made by the author,for example, to use long or short natural sentences. The identification of QSs by human coders,however, is highly unreliable. If, comparing codings of the same texts using quasi-sentences andnatural sentences, there is no appreciable difference in measured political content, then there is astrong case for replacing ‘endogenous’ human unitization with ‘exogenous’ unitization based on natural sentences that can be identified with perfect reliability by machines using pre-specifiedpunctuation delimiters. 2012-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/soss_research/3972 info:doi/10.1017/S0007123412000105 https://ink.library.smu.edu.sg/context/soss_research/article/5230/viewcontent/Daubler_etal_2012_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School of Social Sciences eng Institutional Knowledge at Singapore Management University Models and Methods Political Science
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Models and Methods
Political Science
spellingShingle Models and Methods
Political Science
DAUBLER, Thomas
BENOIT, Kenneth
MIKHAYLOV, Slava
LAVER, Michael
Natural sentences as valid units for coded political texts
description A rapidly growing area in political science has focused on perfecting techniques to treat politicaltext as ‘data’, usually for the purposes of estimating latent traits such as left–right political policypositions.1 More traditional approaches have applied classical content analysis to categorize sub-unitsof political text, such as sentences in manifestos. Prominent examples of this latter approach includethe thirty-year old Comparative Manifestos Project and the Policy Agendas Project.2 ‘Text as data’approaches use machines to convert text to quantitative information and use statistical tools to makeinferences about characteristics of the author of the text. Content analysis schemes use humans to readtextual sub-units and assign these to pre-defined categories. Both methods require the prior identificationof a textual unit of analysis – a highly consequential, yet often unquestioned, feature of research design.Our objective in this Research Note is to question the dominant approach to unitizing politicaltexts prior to human coding. This is to parse texts into quasi-sentences (QSs), where a QS is definedas part or all of a natural sentence that states a distinct policy proposition. The use of the QS ratherthan a natural language unit (such as a sentence defined by punctuation) is motivated by the desireto capture all relevant political information, regardless of the stylistic decisions made by the author,for example, to use long or short natural sentences. The identification of QSs by human coders,however, is highly unreliable. If, comparing codings of the same texts using quasi-sentences andnatural sentences, there is no appreciable difference in measured political content, then there is astrong case for replacing ‘endogenous’ human unitization with ‘exogenous’ unitization based on natural sentences that can be identified with perfect reliability by machines using pre-specifiedpunctuation delimiters.
format text
author DAUBLER, Thomas
BENOIT, Kenneth
MIKHAYLOV, Slava
LAVER, Michael
author_facet DAUBLER, Thomas
BENOIT, Kenneth
MIKHAYLOV, Slava
LAVER, Michael
author_sort DAUBLER, Thomas
title Natural sentences as valid units for coded political texts
title_short Natural sentences as valid units for coded political texts
title_full Natural sentences as valid units for coded political texts
title_fullStr Natural sentences as valid units for coded political texts
title_full_unstemmed Natural sentences as valid units for coded political texts
title_sort natural sentences as valid units for coded political texts
publisher Institutional Knowledge at Singapore Management University
publishDate 2012
url https://ink.library.smu.edu.sg/soss_research/3972
https://ink.library.smu.edu.sg/context/soss_research/article/5230/viewcontent/Daubler_etal_2012_pv.pdf
_version_ 1814047824125362176