A corpus based analysis of -kan and -i in Indonesian
The importance of capturing patterns of shared verb behavior through verb classes was called to attention by Fillmore (1970) in his seminal paper “The Grammar of Hitting and Breaking”. In his work, he recognized that verbs in English could be grouped into classes based on their semantic similarity a...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Research |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/136955 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-136955 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1369552020-10-28T08:29:19Z A corpus based analysis of -kan and -i in Indonesian Choi, Hannah Yun Jung Francis Bond School of Humanities fcbond@ntu.edu.sg Humanities::Linguistics The importance of capturing patterns of shared verb behavior through verb classes was called to attention by Fillmore (1970) in his seminal paper “The Grammar of Hitting and Breaking”. In his work, he recognized that verbs in English could be grouped into classes based on their semantic similarity as well as shared grammatical behavior and argument realization. Specifically, he showed how hit and break verbs are each members of larger classes of verbs whose members share comparable patterns of behavior such as participation in the causative alternation and interpretations available to their passive particles (1970:125). Other studies have since been done in English which confirmed and expanded on Fillmore’s findings (Dixon, 1992; Jackendoff, 1992; Levin & Hovav, 1991), most notably by Levin (1993) in her seminal book “English Verb Classes and Alternations”. Moving beyond English, the idea of semantically related verb classes having shared syntactic behaviors has also been identified and explored in other languages such as Lhasa Tibetan (DeLancey, 1995), Kimaragang Dusun (Kroeger, 2010) and Indonesian (Voskuil, 1996). Most recently, this idea has been implemented computationally in Hebrew by Sheinfux et al. (2017). Their study proposed an analysis that explained argument structure phenomena in Hebrew by distinguishing between semantic and syntactic selection and stating the constraints in each level separately. Indonesian (ISO 639-3: ind), is the national language of the multilingual Indonesian archipelago. This Austronesian language is spoken by more than 22 million speakers as a first language (Lewis, 2009). As an agglutinating language, Indonesian employs several suffixes and prefixes with verbal roots (Sneddon et al., 2010). This thesis explores the use of these affixes together with argument role information as a basis of identifying verb sub-classes in Indonesian. The data used in this thesis comes from the Indonesian sub-corpora found in the Leipzig Corpora Collection (LCC) (Quasthoff et al., 2006) which contains over 15 million sentences from news and web articles. I searched the corpus for verbs containing the prefix meN- and suffixes -kan and -i. I then grouped these verbs into one of seven distinct groups according to its morphological behavior. I selected 50 verb roots from each group and extracted a total of 4800 sentences for further analysis. I annotated these sentences with semantic roles and arguments based on a list adapted from Sheinfux et al. (2017). I found that it is possible to use the morphological information of affixes to arrive at a coarse-grained sub-classification of verbs in Indonesian that confirms the findings from existing research. I also show that more fine-grained classification can be achieved using semantic information from argument roles. Master of Arts 2020-02-07T02:05:24Z 2020-02-07T02:05:24Z 2019 Thesis-Master by Research Choi, H. Y. J. (2019). A corpus based analysis of -kan and -i in Indonesian. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/136955 10.32657/10356/136955 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Humanities::Linguistics |
spellingShingle |
Humanities::Linguistics Choi, Hannah Yun Jung A corpus based analysis of -kan and -i in Indonesian |
description |
The importance of capturing patterns of shared verb behavior through verb classes was called to attention by Fillmore (1970) in his seminal paper “The Grammar of Hitting and Breaking”. In his work, he recognized that verbs in English could be grouped into classes based on their semantic similarity as well as shared grammatical behavior and argument realization. Specifically, he showed how hit and break verbs are each members of larger classes of verbs whose members share comparable patterns of behavior such as participation in the causative alternation and interpretations available to their passive particles (1970:125). Other studies have since been done in English which confirmed and expanded on Fillmore’s findings (Dixon, 1992; Jackendoff, 1992; Levin & Hovav, 1991), most notably by Levin (1993) in her seminal book “English Verb Classes and Alternations”. Moving beyond English, the idea of semantically related verb classes having shared syntactic behaviors has also been identified and explored in other languages such as Lhasa Tibetan (DeLancey, 1995), Kimaragang Dusun (Kroeger, 2010) and Indonesian (Voskuil, 1996). Most recently, this idea has been implemented computationally in Hebrew by Sheinfux et al. (2017). Their study proposed an analysis that explained argument structure phenomena in Hebrew by distinguishing between semantic and syntactic selection and stating the constraints in each level separately.
Indonesian (ISO 639-3: ind), is the national language of the multilingual Indonesian archipelago. This Austronesian language is spoken by more than 22 million speakers as a first language (Lewis, 2009). As an agglutinating language, Indonesian employs several suffixes and prefixes with verbal roots (Sneddon et al., 2010). This thesis explores the use of these affixes together with argument role information as a basis of identifying verb sub-classes in Indonesian.
The data used in this thesis comes from the Indonesian sub-corpora found in the Leipzig Corpora Collection (LCC) (Quasthoff et al., 2006) which contains over 15 million sentences from news and web articles. I searched the corpus for verbs containing the prefix meN- and suffixes -kan and -i. I then grouped these verbs into one of seven distinct groups according to its morphological behavior. I selected 50 verb roots from each group and extracted a total of 4800 sentences for further analysis. I annotated these sentences with semantic roles and arguments based on a list adapted from Sheinfux et al. (2017).
I found that it is possible to use the morphological information of affixes to arrive at a coarse-grained sub-classification of verbs in Indonesian that confirms the findings from existing research. I also show that more fine-grained classification can be achieved using semantic information from argument roles. |
author2 |
Francis Bond |
author_facet |
Francis Bond Choi, Hannah Yun Jung |
format |
Thesis-Master by Research |
author |
Choi, Hannah Yun Jung |
author_sort |
Choi, Hannah Yun Jung |
title |
A corpus based analysis of -kan and -i in Indonesian |
title_short |
A corpus based analysis of -kan and -i in Indonesian |
title_full |
A corpus based analysis of -kan and -i in Indonesian |
title_fullStr |
A corpus based analysis of -kan and -i in Indonesian |
title_full_unstemmed |
A corpus based analysis of -kan and -i in Indonesian |
title_sort |
corpus based analysis of -kan and -i in indonesian |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/136955 |
_version_ |
1683494094069825536 |