FEATURES DEVELOPMENT AND FEATURE EXTRACTION METHODS FOR IDENTIFICATION OF SCIENTIFIC RELATION SCHEMES BASED ON RHETORICAL CITATION

This dissertation research discusses the identification of scientific papers relations based on rhetorical citation obtained by analyzing the citation context contained in a citation sentence. This approach is known as a citation context-based approach, where this approach is more detailed compar...

Full description

Saved in:
Bibliographic Details
Main Author: Sibaroni, Yuliant
Format: Dissertations
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/49322
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:49322
spelling id-itb.:493222020-09-14T13:36:01ZFEATURES DEVELOPMENT AND FEATURE EXTRACTION METHODS FOR IDENTIFICATION OF SCIENTIFIC RELATION SCHEMES BASED ON RHETORICAL CITATION Sibaroni, Yuliant Indonesia Dissertations paper relations, extend, criticize, compare, features, machine learning INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/49322 This dissertation research discusses the identification of scientific papers relations based on rhetorical citation obtained by analyzing the citation context contained in a citation sentence. This approach is known as a citation context-based approach, where this approach is more detailed compared to the previous two approaches namely content-based and citation analysis-based approaches. The latter two relations approaches can only be used to identify similarities relation between papers. At present, the schema of scientific paper relations developed based on citation context is only explicitly carried out by Wang et al, where the relations produced are extend, criticize, and compare relations. The main feature used by Wang to identify this paper relation is quite simple, namely the cue phrase feature. The focus of this dissertation research is to develop a feature extraction method and produced a feature set of paper relations that can identify Wang's paper relations better. The identification of paper relations is done by classifying each sentence using a supervised machine learning approach. The feature development process is carried out in stages, starting from the extend relation, the critique relation, and finally the compare relation. The results showed that each type of paper relations has special and different features. In the extend relation, several important features were obtained, namely the phrase combination feature and the n-gram feature with top-N correlation. In criticize relations, there are 5 groups of important features, namely the adaptation feature of extended relations, the combination of cue phrases with citation, the combination of cue phrases with previous citation, the combination of cue phrases, and conjunction of some basic features. In the compare relation, there are three important groups of features produced, namely proportionWord feature, probabilityWord feature, and cuephraseWord feature. The feature development process is done by observing the patterns that appear in each relation sentence. Although compared to the baseline, the proposed feature has better performance, but there are still some problems such as the high false prediction values that still appear, the missing of the citation context sentence (co-reference), and so forth.. The increase of F-Measure obtained ranges from 15-40% compared to the baseline feature text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description This dissertation research discusses the identification of scientific papers relations based on rhetorical citation obtained by analyzing the citation context contained in a citation sentence. This approach is known as a citation context-based approach, where this approach is more detailed compared to the previous two approaches namely content-based and citation analysis-based approaches. The latter two relations approaches can only be used to identify similarities relation between papers. At present, the schema of scientific paper relations developed based on citation context is only explicitly carried out by Wang et al, where the relations produced are extend, criticize, and compare relations. The main feature used by Wang to identify this paper relation is quite simple, namely the cue phrase feature. The focus of this dissertation research is to develop a feature extraction method and produced a feature set of paper relations that can identify Wang's paper relations better. The identification of paper relations is done by classifying each sentence using a supervised machine learning approach. The feature development process is carried out in stages, starting from the extend relation, the critique relation, and finally the compare relation. The results showed that each type of paper relations has special and different features. In the extend relation, several important features were obtained, namely the phrase combination feature and the n-gram feature with top-N correlation. In criticize relations, there are 5 groups of important features, namely the adaptation feature of extended relations, the combination of cue phrases with citation, the combination of cue phrases with previous citation, the combination of cue phrases, and conjunction of some basic features. In the compare relation, there are three important groups of features produced, namely proportionWord feature, probabilityWord feature, and cuephraseWord feature. The feature development process is done by observing the patterns that appear in each relation sentence. Although compared to the baseline, the proposed feature has better performance, but there are still some problems such as the high false prediction values that still appear, the missing of the citation context sentence (co-reference), and so forth.. The increase of F-Measure obtained ranges from 15-40% compared to the baseline feature
format Dissertations
author Sibaroni, Yuliant
spellingShingle Sibaroni, Yuliant
FEATURES DEVELOPMENT AND FEATURE EXTRACTION METHODS FOR IDENTIFICATION OF SCIENTIFIC RELATION SCHEMES BASED ON RHETORICAL CITATION
author_facet Sibaroni, Yuliant
author_sort Sibaroni, Yuliant
title FEATURES DEVELOPMENT AND FEATURE EXTRACTION METHODS FOR IDENTIFICATION OF SCIENTIFIC RELATION SCHEMES BASED ON RHETORICAL CITATION
title_short FEATURES DEVELOPMENT AND FEATURE EXTRACTION METHODS FOR IDENTIFICATION OF SCIENTIFIC RELATION SCHEMES BASED ON RHETORICAL CITATION
title_full FEATURES DEVELOPMENT AND FEATURE EXTRACTION METHODS FOR IDENTIFICATION OF SCIENTIFIC RELATION SCHEMES BASED ON RHETORICAL CITATION
title_fullStr FEATURES DEVELOPMENT AND FEATURE EXTRACTION METHODS FOR IDENTIFICATION OF SCIENTIFIC RELATION SCHEMES BASED ON RHETORICAL CITATION
title_full_unstemmed FEATURES DEVELOPMENT AND FEATURE EXTRACTION METHODS FOR IDENTIFICATION OF SCIENTIFIC RELATION SCHEMES BASED ON RHETORICAL CITATION
title_sort features development and feature extraction methods for identification of scientific relation schemes based on rhetorical citation
url https://digilib.itb.ac.id/gdl/view/49322
_version_ 1822272004456185856