IMPLEMENTASI VECTOR SPACE MODEL (VSM) UNTUK TEMU KEMBALI INFORMASI WEB DAN PENGELOMPOKAN HASILNYA
The problems faced by a seeker of information via the internet today is not the lack of information but too much information available, even though sometimes the availability of such information has not been obtained in accordance with the wishes. Therefore we need a system that can find the informa...
Saved in:
Main Authors: | , |
---|---|
Format: | Theses and Dissertations NonPeerReviewed |
Published: |
[Yogyakarta] : Universitas Gadjah Mada
2011
|
Subjects: | |
Online Access: | https://repository.ugm.ac.id/88298/ http://etd.ugm.ac.id/index.php?mod=penelitian_detail&sub=PenelitianDetail&act=view&typ=html&buku_id=50428 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universitas Gadjah Mada |
Summary: | The problems faced by a seeker of information via the internet today is not
the lack of information but too much information available, even though
sometimes the availability of such information has not been obtained in
accordance with the wishes. Therefore we need a system that can find the
information through the website according to relevance (similarity) desired by the
user. Clustering the document, the result of the process of information retrieval
based on similarity, before they are displayed to the user (users) are expected to
increase the effectiveness of web users in search of documents.
Data representation model for the indexing of documents using inverted
index model, implemented using Lucene Library with the Java platform, including
the process of stemming in the Indonesian language, expected to improve the
performance results of the implementation of information retrieval, which is
measured based on recall-precision value and obtained value of 0.6 for the Fmeasure.
K-Means algorithm as a clustering partition model and Bisecting K-Means
as a combined model Hierarchical (divisive) and partition, used for grouping of
documents as a result of the implementation process of information retrieval, the
quality of clustering is measured by the external method that is F-Measure and
internal method Intra-Cluster Similarity Technique (IST). Collection of web
documents that will be used for testing were taken from Indonesian language
news portal site in the category of economics, business and financial district with
26.240 the number of pages as html documents. From the test results of FMeasure
value for both methods are relatively the same at 0.75, but for the IST is
difference, Bisecting K-Means method to achieve the best IST value is 0.8 at 10%
and K-Means only reached 7%. |
---|