Topic classification and association rule mining for Chinese Mathematics questions

This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Huicheng.
Other Authors:	Hui Siu Cheung
Format:	Final Year Project
Language:	English
Published:	2011
Subjects:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Online Access:	http://hdl.handle.net/10356/44025
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-44025
record_format	dspace
spelling	sg-ntu-dr.10356-440252023-03-03T20:57:44Z Topic classification and association rule mining for Chinese Mathematics questions Tan, Huicheng. Hui Siu Cheung School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual Data and Association Rule Mining in Formulas. All three types of data mining starts from keywords (terms) identification which are all conducted manually. After data preprocessing, all questions are transformed into ARFF file format. As for the Sub-topic Classification, three widely used algorithms including Support Vector Machine, Decision Tree and Random Forest are compared in terms of the classification performance. Based on the experimental results, Random Forest outperformed the other two algorithms. For the two types of association rule mining, both of them apply the FP-Growth algorithm and an user feedback is conducted to evaluate the usefulness of the generated association rules. As the feedback shows 80% rules generated from textual data with min_support and min_confidence set as 0.115 and 0.8 respectively are useful. The percentage of useful rules mined from formulas is 82.8% with min_support and min_confidence set to 0.07 and 0.9 respectively. Bachelor of Engineering (Computer Science) 2011-05-19T06:24:33Z 2011-05-19T06:24:33Z 2011 2011 Final Year Project (FYP) http://hdl.handle.net/10356/44025 en Nanyang Technological University 87 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Tan, Huicheng. Topic classification and association rule mining for Chinese Mathematics questions
description	This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual Data and Association Rule Mining in Formulas. All three types of data mining starts from keywords (terms) identification which are all conducted manually. After data preprocessing, all questions are transformed into ARFF file format. As for the Sub-topic Classification, three widely used algorithms including Support Vector Machine, Decision Tree and Random Forest are compared in terms of the classification performance. Based on the experimental results, Random Forest outperformed the other two algorithms. For the two types of association rule mining, both of them apply the FP-Growth algorithm and an user feedback is conducted to evaluate the usefulness of the generated association rules. As the feedback shows 80% rules generated from textual data with min_support and min_confidence set as 0.115 and 0.8 respectively are useful. The percentage of useful rules mined from formulas is 82.8% with min_support and min_confidence set to 0.07 and 0.9 respectively.
author2	Hui Siu Cheung
author_facet	Hui Siu Cheung Tan, Huicheng.
format	Final Year Project
author	Tan, Huicheng.
author_sort	Tan, Huicheng.
title	Topic classification and association rule mining for Chinese Mathematics questions
title_short	Topic classification and association rule mining for Chinese Mathematics questions
title_full	Topic classification and association rule mining for Chinese Mathematics questions
title_fullStr	Topic classification and association rule mining for Chinese Mathematics questions
title_full_unstemmed	Topic classification and association rule mining for Chinese Mathematics questions
title_sort	topic classification and association rule mining for chinese mathematics questions
publishDate	2011
url	http://hdl.handle.net/10356/44025
_version_	1759856828746825728

Topic classification and association rule mining for Chinese Mathematics questions

Similar Items