Develop web crawler

The Blackboard system used by http://www.edventure.sg has a Campus Pack Wiki Tool that allows users to create and update Wikis to facilitate learning. Two such Wikis have been created for the course SC207/CPE207 Software Engineering, the Seminar Wiki and the Conspectus Wiki. The Conspectus Wiki a...

Full description

Saved in:
Bibliographic Details
Main Author: Soh, Justin Ang Long.
Other Authors: Kevin Anthony Jones
Format: Final Year Project
Language:English
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/10356/48596
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-48596
record_format dspace
spelling sg-ntu-dr.10356-485962023-03-03T20:56:51Z Develop web crawler Soh, Justin Ang Long. Kevin Anthony Jones School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Software::Software engineering The Blackboard system used by http://www.edventure.sg has a Campus Pack Wiki Tool that allows users to create and update Wikis to facilitate learning. Two such Wikis have been created for the course SC207/CPE207 Software Engineering, the Seminar Wiki and the Conspectus Wiki. The Conspectus Wiki allows students to share their summary of the subject, while the Seminar Wiki allows students to share their answers and opinions to questions given by their lecturer. Therefore, these Wikis become an excellent channel for sharing knowledge. The lecturer awards marks to students based on the number of comments they have made. However, there are too many comments on the Wikis that it becomes too tedious and time consuming to count them manually for each student. There is a need for automation of the counting process. The goal of this project is to develop an application to assist the lecturer in the counting of student names. Before the counting process can be started, data must be extracted from a Wiki, and student names must be filtered out and properly identified. The application of this project cannot simply identify student names by matching them to their registered student names as people do not always write their names in the same way all the time. For example, “John Tan” could also write his name as “Tan John”. The key to developing this application is by String manipulation. Strings are sequences of characters. By breaking down and comparing Strings, identification of specific Strings is possible. In this case, I want to firstly identify student names and filter them out from the raw data, and secondly properly identify who is who according to the registered student names. Bachelor of Engineering (Computer Science) 2012-04-27T02:45:17Z 2012-04-27T02:45:17Z 2012 2012 Final Year Project (FYP) http://hdl.handle.net/10356/48596 en Nanyang Technological University 59 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Software::Software engineering
spellingShingle DRNTU::Engineering::Computer science and engineering::Software::Software engineering
Soh, Justin Ang Long.
Develop web crawler
description The Blackboard system used by http://www.edventure.sg has a Campus Pack Wiki Tool that allows users to create and update Wikis to facilitate learning. Two such Wikis have been created for the course SC207/CPE207 Software Engineering, the Seminar Wiki and the Conspectus Wiki. The Conspectus Wiki allows students to share their summary of the subject, while the Seminar Wiki allows students to share their answers and opinions to questions given by their lecturer. Therefore, these Wikis become an excellent channel for sharing knowledge. The lecturer awards marks to students based on the number of comments they have made. However, there are too many comments on the Wikis that it becomes too tedious and time consuming to count them manually for each student. There is a need for automation of the counting process. The goal of this project is to develop an application to assist the lecturer in the counting of student names. Before the counting process can be started, data must be extracted from a Wiki, and student names must be filtered out and properly identified. The application of this project cannot simply identify student names by matching them to their registered student names as people do not always write their names in the same way all the time. For example, “John Tan” could also write his name as “Tan John”. The key to developing this application is by String manipulation. Strings are sequences of characters. By breaking down and comparing Strings, identification of specific Strings is possible. In this case, I want to firstly identify student names and filter them out from the raw data, and secondly properly identify who is who according to the registered student names.
author2 Kevin Anthony Jones
author_facet Kevin Anthony Jones
Soh, Justin Ang Long.
format Final Year Project
author Soh, Justin Ang Long.
author_sort Soh, Justin Ang Long.
title Develop web crawler
title_short Develop web crawler
title_full Develop web crawler
title_fullStr Develop web crawler
title_full_unstemmed Develop web crawler
title_sort develop web crawler
publishDate 2012
url http://hdl.handle.net/10356/48596
_version_ 1759858073926631424