A corpus based analysis of -kan and -i in Indonesian

The importance of capturing patterns of shared verb behavior through verb classes was called to attention by Fillmore (1970) in his seminal paper “The Grammar of Hitting and Breaking”. In his work, he recognized that verbs in English could be grouped into classes based on their semantic similarity a...

Full description

Saved in:
Bibliographic Details
Main Author: Choi, Hannah Yun Jung
Other Authors: Francis Bond
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/136955
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-136955
record_format dspace
spelling sg-ntu-dr.10356-1369552020-10-28T08:29:19Z A corpus based analysis of -kan and -i in Indonesian Choi, Hannah Yun Jung Francis Bond School of Humanities fcbond@ntu.edu.sg Humanities::Linguistics The importance of capturing patterns of shared verb behavior through verb classes was called to attention by Fillmore (1970) in his seminal paper “The Grammar of Hitting and Breaking”. In his work, he recognized that verbs in English could be grouped into classes based on their semantic similarity as well as shared grammatical behavior and argument realization. Specifically, he showed how hit and break verbs are each members of larger classes of verbs whose members share comparable patterns of behavior such as participation in the causative alternation and interpretations available to their passive particles (1970:125). Other studies have since been done in English which confirmed and expanded on Fillmore’s findings (Dixon, 1992; Jackendoff, 1992; Levin & Hovav, 1991), most notably by Levin (1993) in her seminal book “English Verb Classes and Alternations”. Moving beyond English, the idea of semantically related verb classes having shared syntactic behaviors has also been identified and explored in other languages such as Lhasa Tibetan (DeLancey, 1995), Kimaragang Dusun (Kroeger, 2010) and Indonesian (Voskuil, 1996). Most recently, this idea has been implemented computationally in Hebrew by Sheinfux et al. (2017). Their study proposed an analysis that explained argument structure phenomena in Hebrew by distinguishing between semantic and syntactic selection and stating the constraints in each level separately. Indonesian (ISO 639-3: ind), is the national language of the multilingual Indonesian archipelago. This Austronesian language is spoken by more than 22 million speakers as a first language (Lewis, 2009). As an agglutinating language, Indonesian employs several suffixes and prefixes with verbal roots (Sneddon et al., 2010). This thesis explores the use of these affixes together with argument role information as a basis of identifying verb sub-classes in Indonesian. The data used in this thesis comes from the Indonesian sub-corpora found in the Leipzig Corpora Collection (LCC) (Quasthoff et al., 2006) which contains over 15 million sentences from news and web articles. I searched the corpus for verbs containing the prefix meN- and suffixes -kan and -i. I then grouped these verbs into one of seven distinct groups according to its morphological behavior. I selected 50 verb roots from each group and extracted a total of 4800 sentences for further analysis. I annotated these sentences with semantic roles and arguments based on a list adapted from Sheinfux et al. (2017). I found that it is possible to use the morphological information of affixes to arrive at a coarse-grained sub-classification of verbs in Indonesian that confirms the findings from existing research. I also show that more fine-grained classification can be achieved using semantic information from argument roles. Master of Arts 2020-02-07T02:05:24Z 2020-02-07T02:05:24Z 2019 Thesis-Master by Research Choi, H. Y. J. (2019). A corpus based analysis of -kan and -i in Indonesian. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/136955 10.32657/10356/136955 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Humanities::Linguistics
spellingShingle Humanities::Linguistics
Choi, Hannah Yun Jung
A corpus based analysis of -kan and -i in Indonesian
description The importance of capturing patterns of shared verb behavior through verb classes was called to attention by Fillmore (1970) in his seminal paper “The Grammar of Hitting and Breaking”. In his work, he recognized that verbs in English could be grouped into classes based on their semantic similarity as well as shared grammatical behavior and argument realization. Specifically, he showed how hit and break verbs are each members of larger classes of verbs whose members share comparable patterns of behavior such as participation in the causative alternation and interpretations available to their passive particles (1970:125). Other studies have since been done in English which confirmed and expanded on Fillmore’s findings (Dixon, 1992; Jackendoff, 1992; Levin & Hovav, 1991), most notably by Levin (1993) in her seminal book “English Verb Classes and Alternations”. Moving beyond English, the idea of semantically related verb classes having shared syntactic behaviors has also been identified and explored in other languages such as Lhasa Tibetan (DeLancey, 1995), Kimaragang Dusun (Kroeger, 2010) and Indonesian (Voskuil, 1996). Most recently, this idea has been implemented computationally in Hebrew by Sheinfux et al. (2017). Their study proposed an analysis that explained argument structure phenomena in Hebrew by distinguishing between semantic and syntactic selection and stating the constraints in each level separately. Indonesian (ISO 639-3: ind), is the national language of the multilingual Indonesian archipelago. This Austronesian language is spoken by more than 22 million speakers as a first language (Lewis, 2009). As an agglutinating language, Indonesian employs several suffixes and prefixes with verbal roots (Sneddon et al., 2010). This thesis explores the use of these affixes together with argument role information as a basis of identifying verb sub-classes in Indonesian. The data used in this thesis comes from the Indonesian sub-corpora found in the Leipzig Corpora Collection (LCC) (Quasthoff et al., 2006) which contains over 15 million sentences from news and web articles. I searched the corpus for verbs containing the prefix meN- and suffixes -kan and -i. I then grouped these verbs into one of seven distinct groups according to its morphological behavior. I selected 50 verb roots from each group and extracted a total of 4800 sentences for further analysis. I annotated these sentences with semantic roles and arguments based on a list adapted from Sheinfux et al. (2017). I found that it is possible to use the morphological information of affixes to arrive at a coarse-grained sub-classification of verbs in Indonesian that confirms the findings from existing research. I also show that more fine-grained classification can be achieved using semantic information from argument roles.
author2 Francis Bond
author_facet Francis Bond
Choi, Hannah Yun Jung
format Thesis-Master by Research
author Choi, Hannah Yun Jung
author_sort Choi, Hannah Yun Jung
title A corpus based analysis of -kan and -i in Indonesian
title_short A corpus based analysis of -kan and -i in Indonesian
title_full A corpus based analysis of -kan and -i in Indonesian
title_fullStr A corpus based analysis of -kan and -i in Indonesian
title_full_unstemmed A corpus based analysis of -kan and -i in Indonesian
title_sort corpus based analysis of -kan and -i in indonesian
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/136955
_version_ 1683494094069825536