DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA

Many institutions have published data tabularly (recurrent, semi-structured data) on the web. For example, the Bandung Institute of Technology (ITB) as a university institution has publicly published teaching and publication data. The purpose of data collection and data extraction is supporting the...

Full description

Saved in:

Bibliographic Details
Main Author:	DARMAWAN - NIM : 13513096 , AHMAD
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/20837
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:20837
spelling	id-itb.:208372017-10-09T10:28:07ZDEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA DARMAWAN - NIM : 13513096 , AHMAD Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/20837 Many institutions have published data tabularly (recurrent, semi-structured data) on the web. For example, the Bandung Institute of Technology (ITB) as a university institution has publicly published teaching and publication data. The purpose of data collection and data extraction is supporting the development of linked of open data, data visualization, etc. In this case, teaching and publication data can be used for universities to measure lecturer performance. However, the problem with collecting and extracting data is that the data scrattered in many places and the data has a form that doesnÃ‚Â’t meet machine-readable aspect. The technique of data collection is using crawler to retrieve the linked URLs in a seed that potentially having teaching and publication data. The technique of data extractor has many variations based on the case encountered on the webpage. In this case, proposed three general extractor types as solution: template extractor, table extractor, and list extractor. In addition, there is a spesific extraction that be proposed for dealing with a particular problem of bibliographic extraction. In this research, the techniques will combine to certain system based on the data presented. Based on experiment, the system can collect the teaching data from 2013 to 2017 using template extractor. The system can also collect publication data of ITBÃ‚Â’s lecturer with F1 value is 0.887. It used table extractor, list extractor, and bibliography extractor. The system can also be constructed using other combination of extractors to extract teaching and publication data. The collection data on each web page is stored in database with JSON format. The data can be used for open link data development, data visualization, etc. <br /> text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Many institutions have published data tabularly (recurrent, semi-structured data) on the web. For example, the Bandung Institute of Technology (ITB) as a university institution has publicly published teaching and publication data. The purpose of data collection and data extraction is supporting the development of linked of open data, data visualization, etc. In this case, teaching and publication data can be used for universities to measure lecturer performance. However, the problem with collecting and extracting data is that the data scrattered in many places and the data has a form that doesnÃ‚Â’t meet machine-readable aspect. The technique of data collection is using crawler to retrieve the linked URLs in a seed that potentially having teaching and publication data. The technique of data extractor has many variations based on the case encountered on the webpage. In this case, proposed three general extractor types as solution: template extractor, table extractor, and list extractor. In addition, there is a spesific extraction that be proposed for dealing with a particular problem of bibliographic extraction. In this research, the techniques will combine to certain system based on the data presented. Based on experiment, the system can collect the teaching data from 2013 to 2017 using template extractor. The system can also collect publication data of ITBÃ‚Â’s lecturer with F1 value is 0.887. It used table extractor, list extractor, and bibliography extractor. The system can also be constructed using other combination of extractors to extract teaching and publication data. The collection data on each web page is stored in database with JSON format. The data can be used for open link data development, data visualization, etc. <br />
format	Final Project
author	DARMAWAN - NIM : 13513096 , AHMAD
spellingShingle	DARMAWAN - NIM : 13513096 , AHMAD DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
author_facet	DARMAWAN - NIM : 13513096 , AHMAD
author_sort	DARMAWAN - NIM : 13513096 , AHMAD
title	DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_short	DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_full	DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_fullStr	DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_full_unstemmed	DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
title_sort	development of tabular data collector from web to extract university teaching and publication data
url	https://digilib.itb.ac.id/gdl/view/20837
_version_	1822919980167987200

DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA

Similar Items