ABC automatic blog categorizer using K-means algorithm

Many web logs are being published daily throughout the World Wide Web. One of the reasons why blogs are popular is because it is free. During the survey of 2005, there are around 60 million blogs all over the internet (Riley, 2005). With the increasing number of blogs each day, it is hard to search...

Full description

Saved in:
Bibliographic Details
Main Authors: Agustin, Orlando Y., Jr., Cruz, Jhermin Anne S., Flores, Arvin Mark M., Luna, Charles Ian G.
Format: text
Language:English
Published: Animo Repository 2009
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/11139
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_bachelors-11784
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_bachelors-117842022-03-02T03:23:50Z ABC automatic blog categorizer using K-means algorithm Agustin, Orlando Y., Jr. Cruz, Jhermin Anne S. Flores, Arvin Mark M. Luna, Charles Ian G. Many web logs are being published daily throughout the World Wide Web. One of the reasons why blogs are popular is because it is free. During the survey of 2005, there are around 60 million blogs all over the internet (Riley, 2005). With the increasing number of blogs each day, it is hard to search for a specific blog. Organizing these blogs can help in searching because these blogs will have an identity based on its subject making it easier to distinguish from one concept from the other. An example will be searching for a blog containing the word freestyle, which refers to a stroke in swimming. Other subjects like freestyle as related to dance can be filtered out by specifying the intended category. This research aims to solve the problem by developing a software that will categorize blogs to their respective categories. Most document categorization software categorizes documents into pre-defined categories. This research however, aims to automatically categorize blogs based on content without using pre-defined categories. Throughout the course of the research, the proponents learned that the result of the automated categorization of blogs heavily depends on the input provided for the system. For this dataset, using words alone and without a lexical analyzer or some form of understanding the words, it is difficult to come up with clusters with general topics because these words or terms may have different meanings. 2009-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_bachelors/11139 Bachelor's Theses English Animo Repository Blogs Blogs--Social aspects Online journalism Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Blogs
Blogs--Social aspects
Online journalism
Computer Sciences
spellingShingle Blogs
Blogs--Social aspects
Online journalism
Computer Sciences
Agustin, Orlando Y., Jr.
Cruz, Jhermin Anne S.
Flores, Arvin Mark M.
Luna, Charles Ian G.
ABC automatic blog categorizer using K-means algorithm
description Many web logs are being published daily throughout the World Wide Web. One of the reasons why blogs are popular is because it is free. During the survey of 2005, there are around 60 million blogs all over the internet (Riley, 2005). With the increasing number of blogs each day, it is hard to search for a specific blog. Organizing these blogs can help in searching because these blogs will have an identity based on its subject making it easier to distinguish from one concept from the other. An example will be searching for a blog containing the word freestyle, which refers to a stroke in swimming. Other subjects like freestyle as related to dance can be filtered out by specifying the intended category. This research aims to solve the problem by developing a software that will categorize blogs to their respective categories. Most document categorization software categorizes documents into pre-defined categories. This research however, aims to automatically categorize blogs based on content without using pre-defined categories. Throughout the course of the research, the proponents learned that the result of the automated categorization of blogs heavily depends on the input provided for the system. For this dataset, using words alone and without a lexical analyzer or some form of understanding the words, it is difficult to come up with clusters with general topics because these words or terms may have different meanings.
format text
author Agustin, Orlando Y., Jr.
Cruz, Jhermin Anne S.
Flores, Arvin Mark M.
Luna, Charles Ian G.
author_facet Agustin, Orlando Y., Jr.
Cruz, Jhermin Anne S.
Flores, Arvin Mark M.
Luna, Charles Ian G.
author_sort Agustin, Orlando Y., Jr.
title ABC automatic blog categorizer using K-means algorithm
title_short ABC automatic blog categorizer using K-means algorithm
title_full ABC automatic blog categorizer using K-means algorithm
title_fullStr ABC automatic blog categorizer using K-means algorithm
title_full_unstemmed ABC automatic blog categorizer using K-means algorithm
title_sort abc automatic blog categorizer using k-means algorithm
publisher Animo Repository
publishDate 2009
url https://animorepository.dlsu.edu.ph/etd_bachelors/11139
_version_ 1726158598431571968