DEVELOPMENT OF TABULAR DATA COLLECTOR FROM WEB TO EXTRACT UNIVERSITY TEACHING AND PUBLICATION DATA
Many institutions have published data tabularly (recurrent, semi-structured data) on the web. For example, the Bandung Institute of Technology (ITB) as a university institution has publicly published teaching and publication data. The purpose of data collection and data extraction is supporting the...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/20837 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Many institutions have published data tabularly (recurrent, semi-structured data) on the web. For example, the Bandung Institute of Technology (ITB) as a university institution has publicly published teaching and publication data. The purpose of data collection and data extraction is supporting the development of linked of open data, data visualization, etc. In this case, teaching and publication data can be used for universities to measure lecturer performance. However, the problem with collecting and extracting data is that the data scrattered in many places and the data has a form that doesn’t meet machine-readable aspect. The technique of data collection is using crawler to retrieve the linked URLs in a seed that potentially having teaching and publication data. The technique of data extractor has many variations based on the case encountered on the webpage. In this case, proposed three general extractor types as solution: template extractor, table extractor, and list extractor. In addition, there is a spesific extraction that be proposed for dealing with a particular problem of bibliographic extraction. In this research, the techniques will combine to certain system based on the data presented. Based on experiment, the system can collect the teaching data from 2013 to 2017 using template extractor. The system can also collect publication data of ITB’s lecturer with F1 value is 0.887. It used table extractor, list extractor, and bibliography extractor. The system can also be constructed using other combination of extractors to extract teaching and publication data. The collection data on each web page is stored in database with JSON format. The data can be used for open link data development, data visualization, etc. <br />
|
---|