Detecting semantic uncertainty by learning hedge cues in sentences using an HMM

Detecting speculative assertions is essential to distinguish semantically uncertain information from the factual ones in text. This is critical to the trustworthiness of many intelligent systems that are based on information retrieval and natural language processing techniques, such as question answ...

Full description

Saved in:

Bibliographic Details
Main Authors:	LI, Xiujun, GAO, Wei, SHAVLIK, Jude
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2017
Subjects:	Uncertainty detection Hedge cues Naive Bayes HMM Cross-domain training Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/4643 https://ink.library.smu.edu.sg/context/sis_research/article/5646/viewcontent/li.smir14.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5646
record_format	dspace
spelling	sg-smu-ink.sis_research-56462020-01-02T08:27:57Z Detecting semantic uncertainty by learning hedge cues in sentences using an HMM LI, Xiujun GAO, Wei SHAVLIK, Jude Detecting speculative assertions is essential to distinguish semantically uncertain information from the factual ones in text. This is critical to the trustworthiness of many intelligent systems that are based on information retrieval and natural language processing techniques, such as question answering or information extraction. We empirically explore three fundamental issues of uncertainty detection: (1) the predictive ability of different learning methods on this task; (2) whether using unlabeled data can lead to a more accurate model; and (3) whether closed-domain training or crossdomain training is better. For these purposes, we adopt two statistical learning approaches to this problem: the commonly used bag-of-words model based on Naive Bayes, and the sequence labeling approach using a Hidden Markov Model (HMM). We empirically compare between our two approaches as well as externally compare with prior results on the CoNLL-2010 Shared Task 1. Overall, our results are promising: (1) on Wikipedia and biomedical datasets, the HMM model improves over Naive Bayes up to 17.4% and 29.0%, respectively, in terms of absolute F score; (2) compared to CoNLL-2010 systems, our best HMM model achieves 62.9% F score with MLE parameter estimation and 64.0% with EM parameter estimation on Wikipedia dataset, both outperforming the best result (60.2%) of the CoNLL-2010 systems, but our results on the biomedical dataset are less impressive; (3) when the expression ability of a model (e.g., Naive Bayes) is not strong enough, cross-domain training is helpful, and when a model is powerful (e.g., HMM), cross-domain training may produce biased parameters; and (4) under Maximum Likelihood Estimation, combining the unlabeled examples with the labeled helps. 2017-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4643 info:doi/10.1142/9789813223615_0008 https://ink.library.smu.edu.sg/context/sis_research/article/5646/viewcontent/li.smir14.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Uncertainty detection Hedge cues Naive Bayes HMM Cross-domain training Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Uncertainty detection Hedge cues Naive Bayes HMM Cross-domain training Theory and Algorithms
spellingShingle	Uncertainty detection Hedge cues Naive Bayes HMM Cross-domain training Theory and Algorithms LI, Xiujun GAO, Wei SHAVLIK, Jude Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
description	Detecting speculative assertions is essential to distinguish semantically uncertain information from the factual ones in text. This is critical to the trustworthiness of many intelligent systems that are based on information retrieval and natural language processing techniques, such as question answering or information extraction. We empirically explore three fundamental issues of uncertainty detection: (1) the predictive ability of different learning methods on this task; (2) whether using unlabeled data can lead to a more accurate model; and (3) whether closed-domain training or crossdomain training is better. For these purposes, we adopt two statistical learning approaches to this problem: the commonly used bag-of-words model based on Naive Bayes, and the sequence labeling approach using a Hidden Markov Model (HMM). We empirically compare between our two approaches as well as externally compare with prior results on the CoNLL-2010 Shared Task 1. Overall, our results are promising: (1) on Wikipedia and biomedical datasets, the HMM model improves over Naive Bayes up to 17.4% and 29.0%, respectively, in terms of absolute F score; (2) compared to CoNLL-2010 systems, our best HMM model achieves 62.9% F score with MLE parameter estimation and 64.0% with EM parameter estimation on Wikipedia dataset, both outperforming the best result (60.2%) of the CoNLL-2010 systems, but our results on the biomedical dataset are less impressive; (3) when the expression ability of a model (e.g., Naive Bayes) is not strong enough, cross-domain training is helpful, and when a model is powerful (e.g., HMM), cross-domain training may produce biased parameters; and (4) under Maximum Likelihood Estimation, combining the unlabeled examples with the labeled helps.
format	text
author	LI, Xiujun GAO, Wei SHAVLIK, Jude
author_facet	LI, Xiujun GAO, Wei SHAVLIK, Jude
author_sort	LI, Xiujun
title	Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_short	Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_full	Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_fullStr	Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_full_unstemmed	Detecting semantic uncertainty by learning hedge cues in sentences using an HMM
title_sort	detecting semantic uncertainty by learning hedge cues in sentences using an hmm
publisher	Institutional Knowledge at Singapore Management University
publishDate	2017
url	https://ink.library.smu.edu.sg/sis_research/4643 https://ink.library.smu.edu.sg/context/sis_research/article/5646/viewcontent/li.smir14.pdf
_version_	1770574935239950336

Detecting semantic uncertainty by learning hedge cues in sentences using an HMM

Similar Items