Topic classification and association rule mining for Chinese Mathematics questions

This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Huicheng.
Other Authors: Hui Siu Cheung
Format: Final Year Project
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/44025
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-44025
record_format dspace
spelling sg-ntu-dr.10356-440252023-03-03T20:57:44Z Topic classification and association rule mining for Chinese Mathematics questions Tan, Huicheng. Hui Siu Cheung School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual Data and Association Rule Mining in Formulas. All three types of data mining starts from keywords (terms) identification which are all conducted manually. After data preprocessing, all questions are transformed into ARFF file format. As for the Sub-topic Classification, three widely used algorithms including Support Vector Machine, Decision Tree and Random Forest are compared in terms of the classification performance. Based on the experimental results, Random Forest outperformed the other two algorithms. For the two types of association rule mining, both of them apply the FP-Growth algorithm and an user feedback is conducted to evaluate the usefulness of the generated association rules. As the feedback shows 80% rules generated from textual data with min_support and min_confidence set as 0.115 and 0.8 respectively are useful. The percentage of useful rules mined from formulas is 82.8% with min_support and min_confidence set to 0.07 and 0.9 respectively. Bachelor of Engineering (Computer Science) 2011-05-19T06:24:33Z 2011-05-19T06:24:33Z 2011 2011 Final Year Project (FYP) http://hdl.handle.net/10356/44025 en Nanyang Technological University 87 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Tan, Huicheng.
Topic classification and association rule mining for Chinese Mathematics questions
description This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual Data and Association Rule Mining in Formulas. All three types of data mining starts from keywords (terms) identification which are all conducted manually. After data preprocessing, all questions are transformed into ARFF file format. As for the Sub-topic Classification, three widely used algorithms including Support Vector Machine, Decision Tree and Random Forest are compared in terms of the classification performance. Based on the experimental results, Random Forest outperformed the other two algorithms. For the two types of association rule mining, both of them apply the FP-Growth algorithm and an user feedback is conducted to evaluate the usefulness of the generated association rules. As the feedback shows 80% rules generated from textual data with min_support and min_confidence set as 0.115 and 0.8 respectively are useful. The percentage of useful rules mined from formulas is 82.8% with min_support and min_confidence set to 0.07 and 0.9 respectively.
author2 Hui Siu Cheung
author_facet Hui Siu Cheung
Tan, Huicheng.
format Final Year Project
author Tan, Huicheng.
author_sort Tan, Huicheng.
title Topic classification and association rule mining for Chinese Mathematics questions
title_short Topic classification and association rule mining for Chinese Mathematics questions
title_full Topic classification and association rule mining for Chinese Mathematics questions
title_fullStr Topic classification and association rule mining for Chinese Mathematics questions
title_full_unstemmed Topic classification and association rule mining for Chinese Mathematics questions
title_sort topic classification and association rule mining for chinese mathematics questions
publishDate 2011
url http://hdl.handle.net/10356/44025
_version_ 1759856828746825728