Topic classification and association rule mining for Chinese Mathematics questions
This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2011
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/44025 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-44025 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-440252023-03-03T20:57:44Z Topic classification and association rule mining for Chinese Mathematics questions Tan, Huicheng. Hui Siu Cheung School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual Data and Association Rule Mining in Formulas. All three types of data mining starts from keywords (terms) identification which are all conducted manually. After data preprocessing, all questions are transformed into ARFF file format. As for the Sub-topic Classification, three widely used algorithms including Support Vector Machine, Decision Tree and Random Forest are compared in terms of the classification performance. Based on the experimental results, Random Forest outperformed the other two algorithms. For the two types of association rule mining, both of them apply the FP-Growth algorithm and an user feedback is conducted to evaluate the usefulness of the generated association rules. As the feedback shows 80% rules generated from textual data with min_support and min_confidence set as 0.115 and 0.8 respectively are useful. The percentage of useful rules mined from formulas is 82.8% with min_support and min_confidence set to 0.07 and 0.9 respectively. Bachelor of Engineering (Computer Science) 2011-05-19T06:24:33Z 2011-05-19T06:24:33Z 2011 2011 Final Year Project (FYP) http://hdl.handle.net/10356/44025 en Nanyang Technological University 87 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Tan, Huicheng. Topic classification and association rule mining for Chinese Mathematics questions |
description |
This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual Data and Association Rule Mining in Formulas. All three types of data mining starts from keywords (terms) identification which are all conducted manually. After data preprocessing, all questions are transformed into ARFF file format. As for the Sub-topic Classification, three widely used algorithms including Support Vector Machine, Decision Tree and Random Forest are compared in terms of the classification performance. Based on the experimental results, Random Forest outperformed the other two algorithms. For the two types of association rule mining, both of them apply the FP-Growth algorithm and an user feedback is conducted to evaluate the usefulness of the generated association rules. As the feedback shows 80% rules generated from textual data with min_support and min_confidence set as 0.115 and 0.8 respectively are useful. The percentage of useful rules mined from formulas is 82.8% with min_support and min_confidence set to 0.07 and 0.9 respectively. |
author2 |
Hui Siu Cheung |
author_facet |
Hui Siu Cheung Tan, Huicheng. |
format |
Final Year Project |
author |
Tan, Huicheng. |
author_sort |
Tan, Huicheng. |
title |
Topic classification and association rule mining for Chinese Mathematics questions |
title_short |
Topic classification and association rule mining for Chinese Mathematics questions |
title_full |
Topic classification and association rule mining for Chinese Mathematics questions |
title_fullStr |
Topic classification and association rule mining for Chinese Mathematics questions |
title_full_unstemmed |
Topic classification and association rule mining for Chinese Mathematics questions |
title_sort |
topic classification and association rule mining for chinese mathematics questions |
publishDate |
2011 |
url |
http://hdl.handle.net/10356/44025 |
_version_ |
1759856828746825728 |