Exploiting Domain Structure for Named Entity Recognition

Named Entity Recognition (NER) is a fundamental task in text mining and natural language understanding. Current approaches to NER (mostly based on supervised learning) perform well on domains similar to the training domain, but they tend to adapt poorly to slightly different domains. We present seve...

Full description

Saved in:

Bibliographic Details
Main Authors:	JIANG, Jing, ZHAI, ChengXiang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2006
Subjects:	Databases and Information Systems Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/1255 https://ink.library.smu.edu.sg/context/sis_research/article/2254/viewcontent/HLT_NAACL_06.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-2254
record_format	dspace
spelling	sg-smu-ink.sis_research-22542018-07-13T02:58:14Z Exploiting Domain Structure for Named Entity Recognition JIANG, Jing ZHAI, ChengXiang Named Entity Recognition (NER) is a fundamental task in text mining and natural language understanding. Current approaches to NER (mostly based on supervised learning) perform well on domains similar to the training domain, but they tend to adapt poorly to slightly different domains. We present several strategies for exploiting the domain structure in the training data to learn a more robust named entity recognizer that can perform well on a new domain. First, we propose a simple yet effective way to automatically rank features based on their generalizabilities across domains. We then train a classifier with strong emphasis on the most generalizable features. This emphasis is imposed by putting a rank-based prior on a logistic regression model. We further propose a domain-aware cross validation strategy to help choose an appropriate parameter for the rank-based prior. We evaluated the proposed method with a task of recognizing named entities (genes) in biology text involving three species. The experiment results show that the new domain-aware approach outperforms a state-of-the-art baseline method in adapting to new domains, especially when there is a great difference between the new domain and the training domain. 2006-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1255 info:doi/10.3115/1220835.1220845 https://ink.library.smu.edu.sg/context/sis_research/article/2254/viewcontent/HLT_NAACL_06.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems Numerical Analysis and Scientific Computing
spellingShingle	Databases and Information Systems Numerical Analysis and Scientific Computing JIANG, Jing ZHAI, ChengXiang Exploiting Domain Structure for Named Entity Recognition
description	Named Entity Recognition (NER) is a fundamental task in text mining and natural language understanding. Current approaches to NER (mostly based on supervised learning) perform well on domains similar to the training domain, but they tend to adapt poorly to slightly different domains. We present several strategies for exploiting the domain structure in the training data to learn a more robust named entity recognizer that can perform well on a new domain. First, we propose a simple yet effective way to automatically rank features based on their generalizabilities across domains. We then train a classifier with strong emphasis on the most generalizable features. This emphasis is imposed by putting a rank-based prior on a logistic regression model. We further propose a domain-aware cross validation strategy to help choose an appropriate parameter for the rank-based prior. We evaluated the proposed method with a task of recognizing named entities (genes) in biology text involving three species. The experiment results show that the new domain-aware approach outperforms a state-of-the-art baseline method in adapting to new domains, especially when there is a great difference between the new domain and the training domain.
format	text
author	JIANG, Jing ZHAI, ChengXiang
author_facet	JIANG, Jing ZHAI, ChengXiang
author_sort	JIANG, Jing
title	Exploiting Domain Structure for Named Entity Recognition
title_short	Exploiting Domain Structure for Named Entity Recognition
title_full	Exploiting Domain Structure for Named Entity Recognition
title_fullStr	Exploiting Domain Structure for Named Entity Recognition
title_full_unstemmed	Exploiting Domain Structure for Named Entity Recognition
title_sort	exploiting domain structure for named entity recognition
publisher	Institutional Knowledge at Singapore Management University
publishDate	2006
url	https://ink.library.smu.edu.sg/sis_research/1255 https://ink.library.smu.edu.sg/context/sis_research/article/2254/viewcontent/HLT_NAACL_06.pdf
_version_	1770570910288314368

Exploiting Domain Structure for Named Entity Recognition

Similar Items