Product name detection from user forums

People nowadays are strongly influenced to using acronym and not full names when referring to Smartphone or other products on forums. This leads to an increasing number of different naming convention (e.g., full names, acronym names or certain names referring to more than one product) used when refe...

Full description

Saved in:
Bibliographic Details
Main Author: Peh, Wei Leng
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/58980
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:People nowadays are strongly influenced to using acronym and not full names when referring to Smartphone or other products on forums. This leads to an increasing number of different naming convention (e.g., full names, acronym names or certain names referring to more than one product) used when referring to a certain product. Therefore, this report presents the proposed techniques to automatically identify the product names from online forums. The first proposed technique combines the usage of natural language processing tools, standard matching of noun phrases with a list of phone database and acronyms together with rule based method to further filter the output list of phone names after extraction. The second technique uses the users’ pattern analysis model to extract the possible phone names from forum. From the results, more than 75% of the phone names are extracted for rule-based approach. However, the drawback is that there are too many unnecessary nouns being extracted as mobile names. There are too many false positives in the result. For pattern-based approach, lesser mobile names are being detected and extracted out. Further research on users’ patterns analysis needs to be done for pattern-based approach. Therefore, further improvement needs to be done. Firstly, more rules needs to be defined to further filter the unnecessary words. Secondly, those special words that do not appear for more than 15 times for each thread can be extracted. Thirdly, to add on to the users’ pattern analysis model, a list of categories of words that are hardly used for naming product names can be defined. Lastly, manual annotation on product names can be done in one XML thread and then extract them to train the rest of the data. As more refinements are continuously made, it is believed that the proposed techniques will achieve better performance in identifying the product names automatically.