DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA

Many institutions have published data tabularly (recurrent, semi-structured data) on the web. For example, the Bandung Institute of Technology (ITB) as a university institution has publicly published teaching and publication data. The purpose of data collection and data extraction is supporting the...

Full description

Saved in:
Bibliographic Details
Main Author: DARMAWAN - NIM : 13513096 , AHMAD
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/20837
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:20837
spelling id-itb.:208372017-10-09T10:28:07ZDEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA DARMAWAN - NIM : 13513096 , AHMAD Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/20837 Many institutions have published data tabularly (recurrent, semi-structured data) on the web. For example, the Bandung Institute of Technology (ITB) as a university institution has publicly published teaching and publication data. The purpose of data collection and data extraction is supporting the development of linked of open data, data visualization, etc. In this case, teaching and publication data can be used for universities to measure lecturer performance. However, the problem with collecting and extracting data is that the data scrattered in many places and the data has a form that doesn’t meet machine-readable aspect. The technique of data collection is using crawler to retrieve the linked URLs in a seed that potentially having teaching and publication data. The technique of data extractor has many variations based on the case encountered on the webpage. In this case, proposed three general extractor types as solution: template extractor, table extractor, and list extractor. In addition, there is a spesific extraction that be proposed for dealing with a particular problem of bibliographic extraction. In this research, the techniques will combine to certain system based on the data presented. Based on experiment, the system can collect the teaching data from 2013 to 2017 using template extractor. The system can also collect publication data of ITB’s lecturer with F1 value is 0.887. It used table extractor, list extractor, and bibliography extractor. The system can also be constructed using other combination of extractors to extract teaching and publication data. The collection data on each web page is stored in database with JSON format. The data can be used for open link data development, data visualization, etc. <br /> text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Many institutions have published data tabularly (recurrent, semi-structured data) on the web. For example, the Bandung Institute of Technology (ITB) as a university institution has publicly published teaching and publication data. The purpose of data collection and data extraction is supporting the development of linked of open data, data visualization, etc. In this case, teaching and publication data can be used for universities to measure lecturer performance. However, the problem with collecting and extracting data is that the data scrattered in many places and the data has a form that doesn’t meet machine-readable aspect. The technique of data collection is using crawler to retrieve the linked URLs in a seed that potentially having teaching and publication data. The technique of data extractor has many variations based on the case encountered on the webpage. In this case, proposed three general extractor types as solution: template extractor, table extractor, and list extractor. In addition, there is a spesific extraction that be proposed for dealing with a particular problem of bibliographic extraction. In this research, the techniques will combine to certain system based on the data presented. Based on experiment, the system can collect the teaching data from 2013 to 2017 using template extractor. The system can also collect publication data of ITB’s lecturer with F1 value is 0.887. It used table extractor, list extractor, and bibliography extractor. The system can also be constructed using other combination of extractors to extract teaching and publication data. The collection data on each web page is stored in database with JSON format. The data can be used for open link data development, data visualization, etc. <br />
format Final Project
author DARMAWAN - NIM : 13513096 , AHMAD
spellingShingle DARMAWAN - NIM : 13513096 , AHMAD
DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
author_facet DARMAWAN - NIM : 13513096 , AHMAD
author_sort DARMAWAN - NIM : 13513096 , AHMAD
title DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_short DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_full DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_fullStr DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_full_unstemmed DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_sort development of tabular data collector from web to extract university teaching and publication data
url https://digilib.itb.ac.id/gdl/view/20837
_version_ 1822919980167987200