A corpus based analysis of -kan and -i in Indonesian

The importance of capturing patterns of shared verb behavior through verb classes was called to attention by Fillmore (1970) in his seminal paper “The Grammar of Hitting and Breaking”. In his work, he recognized that verbs in English could be grouped into classes based on their semantic similarity a...

Full description

Saved in:

Bibliographic Details
Main Author:	Choi, Hannah Yun Jung
Other Authors:	Francis Bond
Format:	Thesis-Master by Research
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Humanities::Linguistics
Online Access:	https://hdl.handle.net/10356/136955
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-136955
record_format	dspace
spelling	sg-ntu-dr.10356-1369552020-10-28T08:29:19Z A corpus based analysis of -kan and -i in Indonesian Choi, Hannah Yun Jung Francis Bond School of Humanities fcbond@ntu.edu.sg Humanities::Linguistics The importance of capturing patterns of shared verb behavior through verb classes was called to attention by Fillmore (1970) in his seminal paper “The Grammar of Hitting and Breaking”. In his work, he recognized that verbs in English could be grouped into classes based on their semantic similarity as well as shared grammatical behavior and argument realization. Specifically, he showed how hit and break verbs are each members of larger classes of verbs whose members share comparable patterns of behavior such as participation in the causative alternation and interpretations available to their passive particles (1970:125). Other studies have since been done in English which confirmed and expanded on Fillmore’s findings (Dixon, 1992; Jackendoff, 1992; Levin & Hovav, 1991), most notably by Levin (1993) in her seminal book “English Verb Classes and Alternations”. Moving beyond English, the idea of semantically related verb classes having shared syntactic behaviors has also been identified and explored in other languages such as Lhasa Tibetan (DeLancey, 1995), Kimaragang Dusun (Kroeger, 2010) and Indonesian (Voskuil, 1996). Most recently, this idea has been implemented computationally in Hebrew by Sheinfux et al. (2017). Their study proposed an analysis that explained argument structure phenomena in Hebrew by distinguishing between semantic and syntactic selection and stating the constraints in each level separately. Indonesian (ISO 639-3: ind), is the national language of the multilingual Indonesian archipelago. This Austronesian language is spoken by more than 22 million speakers as a first language (Lewis, 2009). As an agglutinating language, Indonesian employs several suffixes and prefixes with verbal roots (Sneddon et al., 2010). This thesis explores the use of these affixes together with argument role information as a basis of identifying verb sub-classes in Indonesian. The data used in this thesis comes from the Indonesian sub-corpora found in the Leipzig Corpora Collection (LCC) (Quasthoff et al., 2006) which contains over 15 million sentences from news and web articles. I searched the corpus for verbs containing the prefix meN- and suffixes -kan and -i. I then grouped these verbs into one of seven distinct groups according to its morphological behavior. I selected 50 verb roots from each group and extracted a total of 4800 sentences for further analysis. I annotated these sentences with semantic roles and arguments based on a list adapted from Sheinfux et al. (2017). I found that it is possible to use the morphological information of affixes to arrive at a coarse-grained sub-classification of verbs in Indonesian that confirms the findings from existing research. I also show that more fine-grained classification can be achieved using semantic information from argument roles. Master of Arts 2020-02-07T02:05:24Z 2020-02-07T02:05:24Z 2019 Thesis-Master by Research Choi, H. Y. J. (2019). A corpus based analysis of -kan and -i in Indonesian. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/136955 10.32657/10356/136955 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Humanities::Linguistics
spellingShingle	Humanities::Linguistics Choi, Hannah Yun Jung A corpus based analysis of -kan and -i in Indonesian
description	The importance of capturing patterns of shared verb behavior through verb classes was called to attention by Fillmore (1970) in his seminal paper “The Grammar of Hitting and Breaking”. In his work, he recognized that verbs in English could be grouped into classes based on their semantic similarity as well as shared grammatical behavior and argument realization. Specifically, he showed how hit and break verbs are each members of larger classes of verbs whose members share comparable patterns of behavior such as participation in the causative alternation and interpretations available to their passive particles (1970:125). Other studies have since been done in English which confirmed and expanded on Fillmore’s findings (Dixon, 1992; Jackendoff, 1992; Levin & Hovav, 1991), most notably by Levin (1993) in her seminal book “English Verb Classes and Alternations”. Moving beyond English, the idea of semantically related verb classes having shared syntactic behaviors has also been identified and explored in other languages such as Lhasa Tibetan (DeLancey, 1995), Kimaragang Dusun (Kroeger, 2010) and Indonesian (Voskuil, 1996). Most recently, this idea has been implemented computationally in Hebrew by Sheinfux et al. (2017). Their study proposed an analysis that explained argument structure phenomena in Hebrew by distinguishing between semantic and syntactic selection and stating the constraints in each level separately. Indonesian (ISO 639-3: ind), is the national language of the multilingual Indonesian archipelago. This Austronesian language is spoken by more than 22 million speakers as a first language (Lewis, 2009). As an agglutinating language, Indonesian employs several suffixes and prefixes with verbal roots (Sneddon et al., 2010). This thesis explores the use of these affixes together with argument role information as a basis of identifying verb sub-classes in Indonesian. The data used in this thesis comes from the Indonesian sub-corpora found in the Leipzig Corpora Collection (LCC) (Quasthoff et al., 2006) which contains over 15 million sentences from news and web articles. I searched the corpus for verbs containing the prefix meN- and suffixes -kan and -i. I then grouped these verbs into one of seven distinct groups according to its morphological behavior. I selected 50 verb roots from each group and extracted a total of 4800 sentences for further analysis. I annotated these sentences with semantic roles and arguments based on a list adapted from Sheinfux et al. (2017). I found that it is possible to use the morphological information of affixes to arrive at a coarse-grained sub-classification of verbs in Indonesian that confirms the findings from existing research. I also show that more fine-grained classification can be achieved using semantic information from argument roles.
author2	Francis Bond
author_facet	Francis Bond Choi, Hannah Yun Jung
format	Thesis-Master by Research
author	Choi, Hannah Yun Jung
author_sort	Choi, Hannah Yun Jung
title	A corpus based analysis of -kan and -i in Indonesian
title_short	A corpus based analysis of -kan and -i in Indonesian
title_full	A corpus based analysis of -kan and -i in Indonesian
title_fullStr	A corpus based analysis of -kan and -i in Indonesian
title_full_unstemmed	A corpus based analysis of -kan and -i in Indonesian
title_sort	corpus based analysis of -kan and -i in indonesian
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/136955
_version_	1683494094069825536

A corpus based analysis of -kan and -i in Indonesian

Similar Items