Web blog content classification
The internet is a great source of database that contains valuable information of all kind. Weblogs or blogs has been one of the most popular and greatly growing communication tools on the internet. With many people sharing and discussing on different topic, many valuable information can be attain...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/17934 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The internet is a great source of database that contains valuable information of all kind.
Weblogs or blogs has been one of the most popular and greatly growing communication
tools on the internet. With many people sharing and discussing on different topic, many
valuable information can be attain from it. However, with many blog providers on the
internet providing free space for people to host their blog or discuss on topics, the number
of people “blogging”, has grown exponentially. The problem is, most of the time, blogger
can post entry on any topic anytime they want. Although some blog does categories the
entries posted or have different categories for the user to post their topic on, most off the
categories are either too general or the user might just post their topic on any categories
which is not related at all. This resulted in time wasted on going through unnecessary
entries and most of the time user will be spending more time on searching what they want
to read then reading it. Therefore, weblog content classification program was created to
ease the user the browsing and reading of interested entries. However, in order to classify
the content more specifically to the user desired categories, the involvement of the user
judgment on the keywords related to the categories will be required. For this project, an
open source web scrapper tool Web-Harvest was intergraded with the program to extract
the required desired blog contents for classification. |
---|