การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น

วิทยานิพนธ์ (วท.ม. (วิทยาการคอมพิวเตอร์))--มหาวิทยาลัยสงขลานครินทร์, 2563

Saved in:

Bibliographic Details
Main Author:	ณิชาภัทร ปิ่นโพธิ์
Other Authors:	นิวรรณ วัฒนกิจรุ่งโรจน์
Format:	Theses and Dissertations
Language:	Thai
Published:	มหาวิทยาลัยสงขลานครินทร์ 2023
Subjects:	เวกเตอร์วิเคราะห์
Online Access:	http://kb.psu.ac.th/psukb/handle/2016/18177
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Prince of Songkhla University
Language:	Thai

id	th-psu.2016-18177
record_format	dspace
institution	Prince of Songkhla University
building	Khunying Long Athakravi Sunthorn Learning Resources Center
continent	Asia
country	Thailand Thailand
content_provider	Khunying Long Athakravi Sunthorn Learning Resources Center
collection	PSU Knowledge Bank
language	Thai
topic	เวกเตอร์วิเคราะห์
spellingShingle	เวกเตอร์วิเคราะห์ ณิชาภัทร ปิ่นโพธิ์ การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น
description	วิทยานิพนธ์ (วท.ม. (วิทยาการคอมพิวเตอร์))--มหาวิทยาลัยสงขลานครินทร์, 2563
author2	นิวรรณ วัฒนกิจรุ่งโรจน์
author_facet	นิวรรณ วัฒนกิจรุ่งโรจน์ ณิชาภัทร ปิ่นโพธิ์
format	Theses and Dissertations
author	ณิชาภัทร ปิ่นโพธิ์
author_sort	ณิชาภัทร ปิ่นโพธิ์
title	การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น
title_short	การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น
title_full	การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น
title_fullStr	การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น
title_full_unstemmed	การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น
title_sort	การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น
publisher	มหาวิทยาลัยสงขลานครินทร์
publishDate	2023
url	http://kb.psu.ac.th/psukb/handle/2016/18177
_version_	1767194635271143424
spelling	th-psu.2016-181772023-05-16T08:51:01Z การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น Efficient Text Feature Extraction for Opinion Polarity Classification ณิชาภัทร ปิ่นโพธิ์ นิวรรณ วัฒนกิจรุ่งโรจน์ Faculty of Science (Computer Science) คณะวิทยาศาสตร์ ภาควิชาวิทยาการคอมพิวเตอร์ เวกเตอร์วิเคราะห์ วิทยานิพนธ์ (วท.ม. (วิทยาการคอมพิวเตอร์))--มหาวิทยาลัยสงขลานครินทร์, 2563 Recently, social media users can comment with texts to describe their opinions. These texts can be analyzed to classify them into positive and negative directions. Before creating classifier, the feature vectors for representing the texts must be prepared firstly. Generally, texts are represented by vectors of weights or frequencies of terms that appear in the text. The number of dimensions of vector is equal to the number of terms in the dictionary derived from the possible words in all texts. The large amount of words in dictionary leads to the high dimensional vector for representing text and bring about the long processing time to training and testing the text classification models. This thesis proposed two methods for representing texts including V4D and V8D which are the low-dimensional vectors. The set of positive and negative words were considered to create the vectors. In addition, the feature vectors were derived by using the words of negation which have the significant meanings in a classification of text opinions. In this thesis, four classification techniques including k-Nearest Neighbors, Naive Bayes, Artificial Neural Networks and Support Vector Machine were studies to classify the opinion texts. By experimenting on eight data sets with various domains, the proposed vectors, including V4D and V8D, were compared with the traditional vectors, including TF and TF-IDF in the view of the performances when they were applied to the classification problem. The experimental results show that the proposed vectors for representing text can improve the performance of opinion text classification and provide the best efficiency in the terms of used space and processing time. ในปัจจุบัน ผู้ใช้สื่อสังคมออนไลน์สามารถที่จะแสดงความคิดเห็นผ่านการพิมพ์ข้อความในเรื่องที่สนใจได้อย่างอิสระ ข้อความเหล่านั้นสามารถนํามาวิเคราะห์เพื่อจําแนกหาทิศ ทางการแสดงความคิดเห็นในเชิงบวกและเชิงลบ โดยการวิเคราะห์หาทิศทางความคิดเห็นจะต้องสร้าง เวกเตอร์เพื่อใช้เป็นตัวแทนของข้อความก่อน วิธีทั่วไป คือ การแทนข้อความด้วยเวกเตอร์แสดงค่า น้ําหนักหรือค่าความถี่ของคําที่มีจํานวนมิติเท่ากับจํานวนคําศัพท์ที่มีอยู่ในพจนานุกรมที่ประกอบด้วย คําศัพท์ทั้งหมดที่สามารถมีได้ในข้อความทั้งหมดที่พิจารณา ถ้าคําศัพท์มีปริมาณมาก จํานวนคําที่มีอยู่ ในพจนานุกรมจะเพิ่มขึ้น ทําให้เวกเตอร์แทนข้อความที่ได้นั้นจะมีขนาดใหญ่ตามไปด้วย ซึ่งจะทําให้การสร้างและใช้โมเดลในการจําแนกขั้วความคิดเห็นต้องใช้เวลาในการประมวลผลที่นาน วิทยานิพนธ์นี้ ได้นําเสนอการสกัดคุณลักษณะแทนข้อความในรูปของเวกเตอร์ 2 รูปแบบ คือ เวกเตอร์ V4D และเวกเตอร์ V2D ซึ่งเป็นเวกเตอร์ที่มีมิติน้อย โดยมีการพิจารณา คุณลักษณะที่ได้มาจาก ค่าน้ําหนักคําเชิงบวกและเชิงลบที่ปรากฏในข้อความ นอกจากนี้ยังได้มีการ พิจารณาคุณลักษณะที่ได้จากคําศัพท์บอกการปฏิเสธซึ่งมีความสําคัญต่อความหมายของข้อความและ การจําแนกขั้วความคิดเห็น เวกเตอร์แทนข้อความที่ได้นําเสนอจะถูกใช้เป็นข้อมูลนําเข้าเพื่อสร้าง โมเดลในการจําแนก ซึ่งในงานวิทยานิพนธ์นี้ทําการศึกษาการสร้างโมเดล 4 วิธี ได้แก่ วิธี k-Nearest Neighbors Naive Bayes Artificial Neural Networks ass Support Vector Machine จากการทดลองบนชุดข้อมูลข้อความแสดงความคิดเห็นที่มาจากหลากหลายของโดเมนจํานวน 8 ชุด ข้อมูล เพื่อเปรียบเทียบประสิทธิภาพของการสกัดคุณลักษณะในรูปแบบของเวกเตอร์แทนข้อความที่ เสนอ ได้แก่ เวกเตอร์ V4D และเวกเตอร์ V8D กับการสกัดคุณลักษณะในรูปแบบของเวกเตอร์แบบ ดั้งเดิม ได้แก่ เวกเตอร์ TF และเวกเตอร์ TF-IDF ซึ่งได้ถูกนํามาเป็นข้อมูลนําเข้าในการสร้างโมเดล สําหรับจําแนกขั้วความคิดเห็น พบว่า เวกเตอร์แทนข้อความที่เสนอช่วยเพิ่มความถูกต้องในการจําแนกขั้วความคิดเห็นและให้ประสิทธิภาพในแง่ของพื้นที่ในการจัดเก็บข้อมูลและเวลาที่ใช้ในการประมวลผลได้ดีที่สุด 2023-05-16T08:51:01Z 2023-05-16T08:51:01Z 2020 Thesis http://kb.psu.ac.th/psukb/handle/2016/18177 th Attribution-NonCommercial-NoDerivs 3.0 Thailand http://creativecommons.org/licenses/by-nc-nd/3.0/th/ application/pdf มหาวิทยาลัยสงขลานครินทร์

การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น

Similar Items