CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION

In this digital era, information is very easy to be obtained from various sources online or offline. The ease of obtaining information can also make it difficult for us to know the big picture of the overall amount of information that is very large, therefore, we need a method to get a picture of...

Full description

Saved in:
Bibliographic Details
Main Author: Pieter
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/38868
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:In this digital era, information is very easy to be obtained from various sources online or offline. The ease of obtaining information can also make it difficult for us to know the big picture of the overall amount of information that is very large, therefore, we need a method to get a picture of that large amount of information. Latent Dirichlet allocation is one way to do topic modeling so that many of information can be obtained. In this book, we discuss how to build a simple search engine by using a latent Dirichlet allocation that uses topic distribution information if given documents and words, word distribution if given topics and so on. The document search engine with this LDA provides the output of the most relevant sequence of documents based on the scores that have been made using LDA and TFIDF and uses data originating from the detik.com site and by comparing the results of the document search engine output with the LDA method and the detik.com site the results obtained the search engine with the LDA method is more relevant than the results obtained from the detik.com site.