Detecting semantic uncertainty by learning hedge cues in sentences using an HMM

Detecting speculative assertions is essential to distinguish semantically uncertain information from the factual ones in text. This is critical to the trustworthiness of many intelligent systems that are based on information retrieval and natural language processing techniques, such as question answ...

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Xiujun, GAO, Wei, SHAVLIK, Jude
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
HMM
Online Access:https://ink.library.smu.edu.sg/sis_research/4643
https://ink.library.smu.edu.sg/context/sis_research/article/5646/viewcontent/li.smir14.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5646
record_format dspace
spelling sg-smu-ink.sis_research-56462020-01-02T08:27:57Z Detecting semantic uncertainty by learning hedge cues in sentences using an HMM LI, Xiujun GAO, Wei SHAVLIK, Jude Detecting speculative assertions is essential to distinguish semantically uncertain information from the factual ones in text. This is critical to the trustworthiness of many intelligent systems that are based on information retrieval and natural language processing techniques, such as question answering or information extraction. We empirically explore three fundamental issues of uncertainty detection: (1) the predictive ability of different learning methods on this task; (2) whether using unlabeled data can lead to a more accurate model; and (3) whether closed-domain training or crossdomain training is better. For these purposes, we adopt two statistical learning approaches to this problem: the commonly used bag-of-words model based on Naive Bayes, and the sequence labeling approach using a Hidden Markov Model (HMM). We empirically compare between our two approaches as well as externally compare with prior results on the CoNLL-2010 Shared Task 1. Overall, our results are promising: (1) on Wikipedia and biomedical datasets, the HMM model improves over Naive Bayes up to 17.4% and 29.0%, respectively, in terms of absolute F score; (2) compared to CoNLL-2010 systems, our best HMM model achieves 62.9% F score with MLE parameter estimation and 64.0% with EM parameter estimation on Wikipedia dataset, both outperforming the best result (60.2%) of the CoNLL-2010 systems, but our results on the biomedical dataset are less impressive; (3) when the expression ability of a model (e.g., Naive Bayes) is not strong enough, cross-domain training is helpful, and when a model is powerful (e.g., HMM), cross-domain training may produce biased parameters; and (4) under Maximum Likelihood Estimation, combining the unlabeled examples with the labeled helps. 2017-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4643 info:doi/10.1142/9789813223615_0008 https://ink.library.smu.edu.sg/context/sis_research/article/5646/viewcontent/li.smir14.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Uncertainty detection Hedge cues Naive Bayes HMM Cross-domain training Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Uncertainty detection
Hedge cues
Naive Bayes
HMM
Cross-domain training
Theory and Algorithms
spellingShingle Uncertainty detection
Hedge cues
Naive Bayes
HMM
Cross-domain training
Theory and Algorithms
LI, Xiujun
GAO, Wei
SHAVLIK, Jude
Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
description Detecting speculative assertions is essential to distinguish semantically uncertain information from the factual ones in text. This is critical to the trustworthiness of many intelligent systems that are based on information retrieval and natural language processing techniques, such as question answering or information extraction. We empirically explore three fundamental issues of uncertainty detection: (1) the predictive ability of different learning methods on this task; (2) whether using unlabeled data can lead to a more accurate model; and (3) whether closed-domain training or crossdomain training is better. For these purposes, we adopt two statistical learning approaches to this problem: the commonly used bag-of-words model based on Naive Bayes, and the sequence labeling approach using a Hidden Markov Model (HMM). We empirically compare between our two approaches as well as externally compare with prior results on the CoNLL-2010 Shared Task 1. Overall, our results are promising: (1) on Wikipedia and biomedical datasets, the HMM model improves over Naive Bayes up to 17.4% and 29.0%, respectively, in terms of absolute F score; (2) compared to CoNLL-2010 systems, our best HMM model achieves 62.9% F score with MLE parameter estimation and 64.0% with EM parameter estimation on Wikipedia dataset, both outperforming the best result (60.2%) of the CoNLL-2010 systems, but our results on the biomedical dataset are less impressive; (3) when the expression ability of a model (e.g., Naive Bayes) is not strong enough, cross-domain training is helpful, and when a model is powerful (e.g., HMM), cross-domain training may produce biased parameters; and (4) under Maximum Likelihood Estimation, combining the unlabeled examples with the labeled helps.
format text
author LI, Xiujun
GAO, Wei
SHAVLIK, Jude
author_facet LI, Xiujun
GAO, Wei
SHAVLIK, Jude
author_sort LI, Xiujun
title Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_short Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_full Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_fullStr Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_full_unstemmed Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_sort detecting semantic uncertainty by learning hedge cues in sentences using an hmm
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/4643
https://ink.library.smu.edu.sg/context/sis_research/article/5646/viewcontent/li.smir14.pdf
_version_ 1770574935239950336