Supervised news topic detection

With the advancement of technology, there has been much improvement in the automatic recording of broadcast news by utilizing speech recognition. However the continually increasing dynamic information pool is posing challenges for efficient information retrieval techniques. This pain-point creates t...

Full description

Saved in:
Bibliographic Details
Main Author: Gaur, Mokshika
Other Authors: Chng Eng Siong
Format: Final Year Project
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66733
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-66733
record_format dspace
spelling sg-ntu-dr.10356-667332023-03-03T20:37:58Z Supervised news topic detection Gaur, Mokshika Chng Eng Siong School of Computer Engineering DRNTU::Engineering With the advancement of technology, there has been much improvement in the automatic recording of broadcast news by utilizing speech recognition. However the continually increasing dynamic information pool is posing challenges for efficient information retrieval techniques. This pain-point creates the need to develop systems that can automatically categorize this information under relevant topics for the purpose of easy information retrieval. In recent years, much focus has been given to the subject of topic detection of broadcast news more through unsupervised techniques such as clustering as a few studies focusing on supervised classification techniques. In this project, we propose a simple yet effective approach for this purpose by drawing inspiration from previously conducted studies. In this thesis, we experiment with a supervised machine learning algorithm namely Logistic Regression along with language processing techniques to automatically detect topics from broadcast news comprised in the TDT2 English corpus. We consider the input documents, as a stream of sentences and use the trained classifier to predict the topics they are associated with and accordingly assign these news documents to the most appropriate topic. This approach includes various pre-processing techniques along with feature selection and natural language processing. It can be inferred from the results obtained that the chosen model is able to detect relevant topics of new articles by adopting a simplistic topic detection approach that uses the Logistic Regression classifier while taking inspiration from conducted studies. The proposed model performs in comparison to some state-of-the-art topic classifiers. Bachelor of Engineering (Computer Science) 2016-04-25T01:44:52Z 2016-04-25T01:44:52Z 2016 Final Year Project (FYP) http://hdl.handle.net/10356/66733 en Nanyang Technological University 63 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering
spellingShingle DRNTU::Engineering
Gaur, Mokshika
Supervised news topic detection
description With the advancement of technology, there has been much improvement in the automatic recording of broadcast news by utilizing speech recognition. However the continually increasing dynamic information pool is posing challenges for efficient information retrieval techniques. This pain-point creates the need to develop systems that can automatically categorize this information under relevant topics for the purpose of easy information retrieval. In recent years, much focus has been given to the subject of topic detection of broadcast news more through unsupervised techniques such as clustering as a few studies focusing on supervised classification techniques. In this project, we propose a simple yet effective approach for this purpose by drawing inspiration from previously conducted studies. In this thesis, we experiment with a supervised machine learning algorithm namely Logistic Regression along with language processing techniques to automatically detect topics from broadcast news comprised in the TDT2 English corpus. We consider the input documents, as a stream of sentences and use the trained classifier to predict the topics they are associated with and accordingly assign these news documents to the most appropriate topic. This approach includes various pre-processing techniques along with feature selection and natural language processing. It can be inferred from the results obtained that the chosen model is able to detect relevant topics of new articles by adopting a simplistic topic detection approach that uses the Logistic Regression classifier while taking inspiration from conducted studies. The proposed model performs in comparison to some state-of-the-art topic classifiers.
author2 Chng Eng Siong
author_facet Chng Eng Siong
Gaur, Mokshika
format Final Year Project
author Gaur, Mokshika
author_sort Gaur, Mokshika
title Supervised news topic detection
title_short Supervised news topic detection
title_full Supervised news topic detection
title_fullStr Supervised news topic detection
title_full_unstemmed Supervised news topic detection
title_sort supervised news topic detection
publishDate 2016
url http://hdl.handle.net/10356/66733
_version_ 1759857189062705152