Egocentric activities of daily living recognition application using android platform

Human action recognition (HAR) systems determine the action being done by a person using a variety of algorithms. New applications in the domain have been leveraging the compactness as well as the multiple capabilities of smartphones in terms of sensing and processing power. These recent studies hav...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Canlas, Reich Rechner D.
التنسيق: text
اللغة:English
منشور في: Animo Repository 2018
الموضوعات:
الوصول للمادة أونلاين:https://animorepository.dlsu.edu.ph/etd_masteral/5632
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: De La Salle University
اللغة: English
الوصف
الملخص:Human action recognition (HAR) systems determine the action being done by a person using a variety of algorithms. New applications in the domain have been leveraging the compactness as well as the multiple capabilities of smartphones in terms of sensing and processing power. These recent studies have primarily depended on motion inputs captured by either a camera or an array of sensors. Rarely in the literature have both camera and mechanical sensor signals been used simultaneously in an HAR system. Taking all these into account, this study aims to develop a Human Action Recognition (HAR) application that uses both first-person perspective camera and sensor inputs on a device hosting the Android platform to improve upon existing egocentric HAR systems in terms of efficiency, portability, and accuracy. There were four input streams considered, namely, camera, accelerometer, gyroscope, and magnetometer. Each stream was fed into a combination of one- and two-dimensional Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) Recurrent Neural Networks. All of these streams and networks work in parallel, but their individual classifications were fed into a fully-connected late fusion network. The accuracy and other metrics of the system were evaluated on a selection of actions from both a reference dataset and a new dataset generated from video-sensor data pairs. Results have shown that each of the networks was effective at recognizing the actions considered within the scope of this study, but even more so when their outputs were fused.