Visualizing relation between tags in StackOverflow

Tags have been used increasingly with the high usage of internet search engines. Tags can help searchers to narrow down the search spaces and obtain desired results with efficiency. There are a lot of tags in StackOverflow for the searchers to input during search. The aim of this project is to extra...

Full description

Saved in:
Bibliographic Details
Main Author: Teong, Ke Ming
Other Authors: Xing Zhenchang
Format: Final Year Project
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66643
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Tags have been used increasingly with the high usage of internet search engines. Tags can help searchers to narrow down the search spaces and obtain desired results with efficiency. There are a lot of tags in StackOverflow for the searchers to input during search. The aim of this project is to extract the relation between tags in StackOverflow. The knowledge of the relation between tags can be used for the research and implementation of exploratory search engines. Firstly, the category of each tag in StackOverflow will be extracted. This process includes three parts namely, preprocess, category extraction and postprocess. The extraction is completed using part of speech tagger and regular expression parser. Next, the relation and parent tag is retrieved. This process can be done using results from tag category and keyword searching. Lastly, the retrieved results will be written to csv file and displayed on a force directed graph using Firefox browser. In addition to the graph visualization, the accuracy of extracted category and relations had been investigated and evaluated. The accuracy test conducted was based on manual recognition for the correctness of each result. Two accuracy tests were done on each of the extracted results. The accuracy of extracted category was 75% on average while the accuracy of extracted relations was 91.5% on average.