Term co-occurrence evolution study

Huge data is created continuously and all these data are stored somewhere in its raw form. In this project, we introduced a prototype application using series of algorithms to convert these raw data into a form that we can study on. The project focused on the terms’ co-occurrence evolution over time...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Bernard Mao Sheng
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/58954
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-58954
record_format dspace
spelling sg-ntu-dr.10356-589542023-03-03T20:36:51Z Term co-occurrence evolution study Tan, Bernard Mao Sheng Sun Aixin School of Computer Engineering DRNTU::Engineering::Computer science and engineering Huge data is created continuously and all these data are stored somewhere in its raw form. In this project, we introduced a prototype application using series of algorithms to convert these raw data into a form that we can study on. The project focused on the terms’ co-occurrence evolution over time. In order to implement this application, some research is done to identify ways to transform these raw data into other forms for easy manipulation. Various API Libraries, including Natural Language Processing, Multi-threading and Data Indexing are used. With project focus on studying term co-occurrence evolution, the prototype is designed with a graphical user interface with real-time performance in consideration. The application allows direct user interaction to run analysis which complete within seconds. The result is displayed in two forms, line chart graph and detailed table. User is able to directly manipulate on the line chart by dynamically selecting co-occurred terms that they are interested in. To facilitate on clearer analysis result, the application includes ranking algorithms to rank the terms from the result based on their interestingness. By default, when the analysis is complete, the application will rank the terms, output the line chart with top 5 interesting terms and sort the details in the detailed table. Due to the nature of handling huge data, the application needs to be optimised and fast. This is where preprocessing is performed and multi-threading is added in the analysis process to utilise the system’s computing power to speed up the analysis. Even though, the objective is achieved in identifying interesting co-occurred terms, improvements and additional features could be introduced to extend its potential. Some recommendations include better multi-threading logic and better ranking algorithms. Bachelor of Engineering (Computer Science) 2014-04-17T03:10:42Z 2014-04-17T03:10:42Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/58954 en Nanyang Technological University 37 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Tan, Bernard Mao Sheng
Term co-occurrence evolution study
description Huge data is created continuously and all these data are stored somewhere in its raw form. In this project, we introduced a prototype application using series of algorithms to convert these raw data into a form that we can study on. The project focused on the terms’ co-occurrence evolution over time. In order to implement this application, some research is done to identify ways to transform these raw data into other forms for easy manipulation. Various API Libraries, including Natural Language Processing, Multi-threading and Data Indexing are used. With project focus on studying term co-occurrence evolution, the prototype is designed with a graphical user interface with real-time performance in consideration. The application allows direct user interaction to run analysis which complete within seconds. The result is displayed in two forms, line chart graph and detailed table. User is able to directly manipulate on the line chart by dynamically selecting co-occurred terms that they are interested in. To facilitate on clearer analysis result, the application includes ranking algorithms to rank the terms from the result based on their interestingness. By default, when the analysis is complete, the application will rank the terms, output the line chart with top 5 interesting terms and sort the details in the detailed table. Due to the nature of handling huge data, the application needs to be optimised and fast. This is where preprocessing is performed and multi-threading is added in the analysis process to utilise the system’s computing power to speed up the analysis. Even though, the objective is achieved in identifying interesting co-occurred terms, improvements and additional features could be introduced to extend its potential. Some recommendations include better multi-threading logic and better ranking algorithms.
author2 Sun Aixin
author_facet Sun Aixin
Tan, Bernard Mao Sheng
format Final Year Project
author Tan, Bernard Mao Sheng
author_sort Tan, Bernard Mao Sheng
title Term co-occurrence evolution study
title_short Term co-occurrence evolution study
title_full Term co-occurrence evolution study
title_fullStr Term co-occurrence evolution study
title_full_unstemmed Term co-occurrence evolution study
title_sort term co-occurrence evolution study
publishDate 2014
url http://hdl.handle.net/10356/58954
_version_ 1759856321340899328