Reconstruct 3D information from a single image

The computer vision systems today may be advance but the problem of 3d reconstruction from a single two dimensional image was still considered as an extremely challenging task. On the contrary, we humans could easily reconstruct 3d information from a single two dimensional image. This was because hu...

Full description

Saved in:
Bibliographic Details
Main Author: Phua, Chuan Leong.
Other Authors: He Ying
Format: Final Year Project
Language:English
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/10356/48467
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The computer vision systems today may be advance but the problem of 3d reconstruction from a single two dimensional image was still considered as an extremely challenging task. On the contrary, we humans could easily reconstruct 3d information from a single two dimensional image. This was because humans made use of various visual cues from a single two dimensional image and related these visual cues together in order to be able to visualize 3d information. This marked the objective of this project which was to create a program with similar ability like humans to be able to reconstruct 3d information from a single two dimensional image. All images contained many different scenes and objects taken at various angles and orientations. Therefore the adopted algorithm made a general assumption that the environment was made up of a number of small planes. There were no other explicit assumptions made on the scene structure so as to allow the adopted algorithm to capture as much details of the 3d environment as possible. The adopted algorithm used the superpixel segmentation algorithm where a single image was divided into smaller homogenous patch and a machine learning algorithm, the Markov Random Field (MRF) was used to infer a set of plane parameters that captures both the 3d orientation and 3d location of these patches of superpixels. The MRF which was trained via supervised learning, models the relationship between different parts of the image, determines image occlusions and captures various monocular cues used by humans. The adopted algorithm produced relatively visually pleasing VRML output at a reasonable speed. However, there was still some room for improvement in terms of the overall output quality and speed. Therefore, an option was explored to allow the user to tune the program to either have a faster computational speed or have a higher quality output. The ease of use and user-friendliness of the program were also taken into consideration during the development of program where the target audience need not be computer savvy.