Product name recognition and normalization in internet forums

Collecting user feedback of products is a common practice for the product providers to better understand consumers' concerns or requirements and to further improve their products or marketing strategies. Even though dedicated review sites (e.g., Epinions, Amazon, CNET reviews) supply the relati...

Full description

Saved in:

Bibliographic Details
Main Author:	Yao, Yangjie
Other Authors:	Sun Aixin
Format:	Theses and Dissertations
Language:	English
Published:	2014
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/61814
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-61814
record_format	dspace
spelling	sg-ntu-dr.10356-618142023-03-04T00:46:06Z Product name recognition and normalization in internet forums Yao, Yangjie Sun Aixin School of Computer Engineering DRNTU::Engineering::Computer science and engineering Collecting user feedback of products is a common practice for the product providers to better understand consumers' concerns or requirements and to further improve their products or marketing strategies. Even though dedicated review sites (e.g., Epinions, Amazon, CNET reviews) supply the relatively straightforward approach as user feedback about one specific product is usually well organized in a list, collecting user feedback from Internet forums is challenging. One reason is that user feedback about a product often spreads in different discussion threads in forums. More importantly, users often mention product names with a large number of name variations. On the other hand, Internet forums cover feedback from many more users. Thus, user feedback in more comprehensive aspects can be obtained. We propose a method named Gren to recognize and normalize mobile phone names from Internet forums. Instead of directly recognizing phone names from sentences as in most named entity recognition tasks, we propose an approach to generating candidate names as the first step. The candidate names capture short forms, spelling variations, and nicknames of products, but are not noise free. To predict whether a candidate name mention in a sentence indeed refers to a specific phone model, a CRF based name recognizer is developed. The CRF (Conditional Random Field) model is trained by using a large set of sentences obtained in a semiautomatic manner with minimal manual labeling effort. Lastly, a rule-based name normalization component maps a recognized name to its formal form. For evaluation, we randomly select 20 threads related to 20 mobile phones from an Internet forum. Each thread contains about 100 post messages. We manually labeled the mobile phone name mentions in these posts and mapped the true mentions to their formal names. In total, about 4000 sentences have been manually labeled which contain about 1000 phone name mentions. Evaluated on labeled data, Gren outperforms all baseline methods. Specifically, it achieves precision and recall of 0.918 and 0.875 respectively, with the best feature setting. Comparing to Stanford NER which is considered as a strong baseline, 134% improvement on recall is observed. We also provide detailed analysis of the intermediate results obtained by each of the three components in Gren and observe that features from Blown clustering are the most effective features. Removing them results in the largest degradation in F1 from 0.896 to 0.804. Two implications for NER tasks are further made based on our observation. First, if candidate named entities are able to be pre generated, a large number of training examples may be generated at very low cost for manual annotation. Second, if we can segment the sentences and pre-generate the text chunks, we are able to rewrite the sentences. The rewriting enables us to take surrounding words of a candidate named entity to be its context in a more natural manner. MASTER OF ENGINEERING (SCE) 2014-10-27T06:37:29Z 2014-10-27T06:37:29Z 2014 2014 Thesis Yao, Y. (2014). Product name recognition and normalization in internet forums. Master’s thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/61814 10.32657/10356/61814 en 69 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Yao, Yangjie Product name recognition and normalization in internet forums
description	Collecting user feedback of products is a common practice for the product providers to better understand consumers' concerns or requirements and to further improve their products or marketing strategies. Even though dedicated review sites (e.g., Epinions, Amazon, CNET reviews) supply the relatively straightforward approach as user feedback about one specific product is usually well organized in a list, collecting user feedback from Internet forums is challenging. One reason is that user feedback about a product often spreads in different discussion threads in forums. More importantly, users often mention product names with a large number of name variations. On the other hand, Internet forums cover feedback from many more users. Thus, user feedback in more comprehensive aspects can be obtained. We propose a method named Gren to recognize and normalize mobile phone names from Internet forums. Instead of directly recognizing phone names from sentences as in most named entity recognition tasks, we propose an approach to generating candidate names as the first step. The candidate names capture short forms, spelling variations, and nicknames of products, but are not noise free. To predict whether a candidate name mention in a sentence indeed refers to a specific phone model, a CRF based name recognizer is developed. The CRF (Conditional Random Field) model is trained by using a large set of sentences obtained in a semiautomatic manner with minimal manual labeling effort. Lastly, a rule-based name normalization component maps a recognized name to its formal form. For evaluation, we randomly select 20 threads related to 20 mobile phones from an Internet forum. Each thread contains about 100 post messages. We manually labeled the mobile phone name mentions in these posts and mapped the true mentions to their formal names. In total, about 4000 sentences have been manually labeled which contain about 1000 phone name mentions. Evaluated on labeled data, Gren outperforms all baseline methods. Specifically, it achieves precision and recall of 0.918 and 0.875 respectively, with the best feature setting. Comparing to Stanford NER which is considered as a strong baseline, 134% improvement on recall is observed. We also provide detailed analysis of the intermediate results obtained by each of the three components in Gren and observe that features from Blown clustering are the most effective features. Removing them results in the largest degradation in F1 from 0.896 to 0.804. Two implications for NER tasks are further made based on our observation. First, if candidate named entities are able to be pre generated, a large number of training examples may be generated at very low cost for manual annotation. Second, if we can segment the sentences and pre-generate the text chunks, we are able to rewrite the sentences. The rewriting enables us to take surrounding words of a candidate named entity to be its context in a more natural manner.
author2	Sun Aixin
author_facet	Sun Aixin Yao, Yangjie
format	Theses and Dissertations
author	Yao, Yangjie
author_sort	Yao, Yangjie
title	Product name recognition and normalization in internet forums
title_short	Product name recognition and normalization in internet forums
title_full	Product name recognition and normalization in internet forums
title_fullStr	Product name recognition and normalization in internet forums
title_full_unstemmed	Product name recognition and normalization in internet forums
title_sort	product name recognition and normalization in internet forums
publishDate	2014
url	https://hdl.handle.net/10356/61814
_version_	1759853761629519872

Product name recognition and normalization in internet forums

Similar Items