CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION

In this digital era, information is very easy to be obtained from various sources online or offline. The ease of obtaining information can also make it difficult for us to know the big picture of the overall amount of information that is very large, therefore, we need a method to get a picture of...

Full description

Saved in:
Bibliographic Details
Main Author: Pieter
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/38868
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:38868
spelling id-itb.:388682019-06-19T10:13:40ZCONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION Pieter Indonesia Final Project topic modelling, search engine, latent Dirichlet allocation INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/38868 In this digital era, information is very easy to be obtained from various sources online or offline. The ease of obtaining information can also make it difficult for us to know the big picture of the overall amount of information that is very large, therefore, we need a method to get a picture of that large amount of information. Latent Dirichlet allocation is one way to do topic modeling so that many of information can be obtained. In this book, we discuss how to build a simple search engine by using a latent Dirichlet allocation that uses topic distribution information if given documents and words, word distribution if given topics and so on. The document search engine with this LDA provides the output of the most relevant sequence of documents based on the scores that have been made using LDA and TFIDF and uses data originating from the detik.com site and by comparing the results of the document search engine output with the LDA method and the detik.com site the results obtained the search engine with the LDA method is more relevant than the results obtained from the detik.com site. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description In this digital era, information is very easy to be obtained from various sources online or offline. The ease of obtaining information can also make it difficult for us to know the big picture of the overall amount of information that is very large, therefore, we need a method to get a picture of that large amount of information. Latent Dirichlet allocation is one way to do topic modeling so that many of information can be obtained. In this book, we discuss how to build a simple search engine by using a latent Dirichlet allocation that uses topic distribution information if given documents and words, word distribution if given topics and so on. The document search engine with this LDA provides the output of the most relevant sequence of documents based on the scores that have been made using LDA and TFIDF and uses data originating from the detik.com site and by comparing the results of the document search engine output with the LDA method and the detik.com site the results obtained the search engine with the LDA method is more relevant than the results obtained from the detik.com site.
format Final Project
author Pieter
spellingShingle Pieter
CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION
author_facet Pieter
author_sort Pieter
title CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION
title_short CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION
title_full CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION
title_fullStr CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION
title_full_unstemmed CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION
title_sort content-based search engine for documents using latent dirichlet allocation
url https://digilib.itb.ac.id/gdl/view/38868
_version_ 1822269121185710080