CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION
In this digital era, information is very easy to be obtained from various sources online or offline. The ease of obtaining information can also make it difficult for us to know the big picture of the overall amount of information that is very large, therefore, we need a method to get a picture of...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/38868 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:38868 |
---|---|
spelling |
id-itb.:388682019-06-19T10:13:40ZCONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION Pieter Indonesia Final Project topic modelling, search engine, latent Dirichlet allocation INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/38868 In this digital era, information is very easy to be obtained from various sources online or offline. The ease of obtaining information can also make it difficult for us to know the big picture of the overall amount of information that is very large, therefore, we need a method to get a picture of that large amount of information. Latent Dirichlet allocation is one way to do topic modeling so that many of information can be obtained. In this book, we discuss how to build a simple search engine by using a latent Dirichlet allocation that uses topic distribution information if given documents and words, word distribution if given topics and so on. The document search engine with this LDA provides the output of the most relevant sequence of documents based on the scores that have been made using LDA and TFIDF and uses data originating from the detik.com site and by comparing the results of the document search engine output with the LDA method and the detik.com site the results obtained the search engine with the LDA method is more relevant than the results obtained from the detik.com site. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
In this digital era, information is very easy to be obtained from various sources
online or offline. The ease of obtaining information can also make it difficult for us
to know the big picture of the overall amount of information that is very large,
therefore, we need a method to get a picture of that large amount of information.
Latent Dirichlet allocation is one way to do topic modeling so that many of
information can be obtained. In this book, we discuss how to build a simple search
engine by using a latent Dirichlet allocation that uses topic distribution information
if given documents and words, word distribution if given topics and so on. The
document search engine with this LDA provides the output of the most relevant
sequence of documents based on the scores that have been made using LDA and
TFIDF and uses data originating from the detik.com site and by comparing the
results of the document search engine output with the LDA method and the
detik.com site the results obtained the search engine with the LDA method is more
relevant than the results obtained from the detik.com site. |
format |
Final Project |
author |
Pieter |
spellingShingle |
Pieter CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION |
author_facet |
Pieter |
author_sort |
Pieter |
title |
CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION |
title_short |
CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION |
title_full |
CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION |
title_fullStr |
CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION |
title_full_unstemmed |
CONTENT-BASED SEARCH ENGINE FOR DOCUMENTS USING LATENT DIRICHLET ALLOCATION |
title_sort |
content-based search engine for documents using latent dirichlet allocation |
url |
https://digilib.itb.ac.id/gdl/view/38868 |
_version_ |
1822269121185710080 |