Unsupervised data clustering for energy efficiency monitoring and analysis

For the first half of the Final Year Project, the main research focus is to improve K-Means clustering, to make the K-Means algorithms stable, efficient and auto-determine number of K. Under this research, I build a program that combines density-based clustering techniques with K-Means clustering, e...

Full description

Saved in:
Bibliographic Details
Main Author: Fu, Rong
Other Authors: Li Xiang
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/62693
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:For the first half of the Final Year Project, the main research focus is to improve K-Means clustering, to make the K-Means algorithms stable, efficient and auto-determine number of K. Under this research, I build a program that combines density-based clustering techniques with K-Means clustering, enabling stable selection of K initial centroids. The drawback of this algorithm for stable selection is the time complexity O(n2). In addition, 2 ways of auto-determining number of clusters K are summarized in the Literature Review. Possible improvement for the second approach is also stated, but its validation and application could be investigated in the future research. For the second half of the Final Year Project, my main research focus is to explore the possibility of using data clustering techniques, especially K-Means clustering for energy efficiency monitoring and analysis. Under this research, two different approaches (Whole Batch Feature Extraction & Window Feature Extraction) are investigated. In addition, I build a system that could select features for K-Means, input training/testing split percentage, output training & testing accuracies, and save excel file of cluster results. Currently, the system uses K-Means to learn energy consumption patterns offline, and build models for each energy consumption pattern. Each model is actually a cluster with a cluster center. In the future, the models from the offline training could be used to classify online streaming data and identify their consumption pattern classes once one window data is ready. Our proposed method gives models with training and testing accuracies up to 78%, and reveals some interesting discoveries, relevant to our case study data.