Đánh giá tập nhãn và xác định lỗi tự động trong kho ngữ liệu đã gán nhãn

The first part evaluates properties of tagset and possibility convertible of tagsets in Vietnamese. In the part, main goal is to optimize which tagset is better and whether small tagset can convert into large one and reverse. Thesis achieves this goal by using internal, external criteria and s...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Đỗ, Thị Thanh Tâm
التنسيق: Theses and Dissertations
اللغة:other
منشور في: Đại học Quốc gia Hà Nội 2016
الموضوعات:
الوصول للمادة أونلاين:http://repository.vnu.edu.vn/handle/VNU_123/8263
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:The first part evaluates properties of tagset and possibility convertible of tagsets in Vietnamese. In the part, main goal is to optimize which tagset is better and whether small tagset can convert into large one and reverse. Thesis achieves this goal by using internal, external criteria and statistic of lost ambiguous token. Internal criterion tests whether token assigns POS accurately. External criterion checks quantity of linguistic information is retained. In particular, internal criterion relates to frame and purity notion. To investigate retained information, we carried out merging some tags based on certain classification factor. Each different tagset, we had different parameter. As the result, classification based on syntax has better result but ambiguous words are large. Besides, in Vietnamese, it is hard to convert between tagsets.