Gaze estimation using residual neural network

The eye is a vital source of visual information for emotion, focus and cognitive processes. Tracking the eyes has proved to be an important tool for researches in multiple fields. However, the appearance of the eye is sensitive to large number of variables such as light conditions, head pose, viewin...

Full description

Saved in:
Bibliographic Details
Main Author: Wong, En Teng
Other Authors: Lee Bu Sung, Francis
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/76160
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The eye is a vital source of visual information for emotion, focus and cognitive processes. Tracking the eyes has proved to be an important tool for researches in multiple fields. However, the appearance of the eye is sensitive to large number of variables such as light conditions, head pose, viewing angle, openness and size of the eye, etc. With the emergence of deep learning, many researches dive into using deep learning as an approach for gaze estimation. This paper explored the use of Residual Neural Network (ResNet-18) to predict eye gaze using a massive public dataset called GazeCapture. ResNet-18 is a model developed by Microsoft Research Asia and is the winner of the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2015, achieving a low error rate of 3.5%. GazeCapture is a large-scale eye tracking dataset collected through crowd-sourcing using Amazon Mechanical Turk. GazeCapture offers scalability and high degree of variation, which sets it apart from other large public datasets. This paper also analysed the improvements made by preprocesses such as a) removing incorrect data, b) methods of normalisation, c) extracting features like Euler’s angles for head pose and d) using face grids. From the experiments, it is concluded that ResNet-18 achieved lower errors than iTracker which used AlexNet as part of its model architecture. Applying histogram normalisation and removing incorrect data has also helped in reducing the errors. Furthermore, introducing Euler’s angles were not useful in reducing the errors due to its narrow distribution.