Topic classification and association rule mining for Chinese Mathematics questions

This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Huicheng.
Other Authors: Hui Siu Cheung
Format: Final Year Project
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/44025
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual Data and Association Rule Mining in Formulas. All three types of data mining starts from keywords (terms) identification which are all conducted manually. After data preprocessing, all questions are transformed into ARFF file format. As for the Sub-topic Classification, three widely used algorithms including Support Vector Machine, Decision Tree and Random Forest are compared in terms of the classification performance. Based on the experimental results, Random Forest outperformed the other two algorithms. For the two types of association rule mining, both of them apply the FP-Growth algorithm and an user feedback is conducted to evaluate the usefulness of the generated association rules. As the feedback shows 80% rules generated from textual data with min_support and min_confidence set as 0.115 and 0.8 respectively are useful. The percentage of useful rules mined from formulas is 82.8% with min_support and min_confidence set to 0.07 and 0.9 respectively.