Perceptual and content based analysis for coding multimedia information
This report covers the RGM project “Perceptual and Content Based Analysis for Coding Multimedia Information” (22 October 2002 – 30 June 2006). Chapter 1 deals with the topic of content-based image retrieval (CBIR), where three new approaches are proposed: 1) a framework of knowledge-driven CBIR; 2)...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Research Report |
Language: | English |
Published: |
2008
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/14519 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-14519 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-145192023-03-04T03:20:03Z Perceptual and content based analysis for coding multimedia information Xue, Ping. School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Data::Coding and information theory This report covers the RGM project “Perceptual and Content Based Analysis for Coding Multimedia Information” (22 October 2002 – 30 June 2006). Chapter 1 deals with the topic of content-based image retrieval (CBIR), where three new approaches are proposed: 1) a framework of knowledge-driven CBIR; 2) a criterion to maximize the retrieval performance of Kernel-based Biased Discriminant Analysis (KBDA); and 3) a novel saliency-weighted region-based image retrieval (SW-RBIR) algorithm. This leads to a new image retrieval strategy, where a suitable CBIR algorithm is selected based on image characteristics of foreground objects and backgrounds. Chapter 2 extends the work to video analysis. Two multi-resolution video representation schemes, based on the Kernel-based Principal Component Analysis (KPCA), and the Mean Shift Analysis (MSA) are proposed. Both can lead to efficient multi-resolution video representation by simply tuning the internal parameters. With low computation cost, motion vectors are extracted from MPEG video stream and then processed. This approach is especially suitable for Skycam-based application where the image resolution is usually low. Chapter 3 treats the model selection issue of kernel methods. A unified model selection for both bi-class and multi-class Support Vector Machines (SVMs) is proposed, based on the gradient descent method and conceptually simple and easy to implement. The criterion is then extended to the Kernel-based Linear Discriminant Analysis (KLDA). A generalized radius-margin bound is developed for multi-class SVMs to efficiently perform both model selection and feature selection. ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library Chapter 4 discusses the video repeat identification and video structure analysis. Effective solutions to identify short video repeats from video collections or streams are developed for the purpose of video structure analysis, important event mining, and commercial detection and skipping. Chapter 5 is focused on the image segmentation based on the disjoint set union. A new watershed algorithm is proposed to address issues like the over-segmentation and the memory overflow in some existing methods. Chapter 6 explores the characteristics of the human visual system (HVS) and proposed an improved scheme for estimating just-noticeable distortion (JND). In general, any information below JND can be ignored. Applying the JND gauge to image and video compression can lead to the improvement on coding efficiency as bits can be located according to the JND thresholds in different areas. In other words, the improvement on the perceptual quality of coded images and videos can be observed as compared with traditional coding methods. 2008-11-26T08:25:09Z 2008-11-26T08:25:09Z 2006 2006 Research Report http://hdl.handle.net/10356/14519 en 221 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Data::Coding and information theory |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Data::Coding and information theory Xue, Ping. Perceptual and content based analysis for coding multimedia information |
description |
This report covers the RGM project “Perceptual and Content Based Analysis for Coding Multimedia Information” (22 October 2002 – 30 June 2006).
Chapter 1 deals with the topic of content-based image retrieval (CBIR), where three new approaches are proposed: 1) a framework of knowledge-driven CBIR; 2) a criterion to maximize the retrieval performance of Kernel-based Biased Discriminant Analysis (KBDA); and 3) a novel saliency-weighted region-based image retrieval (SW-RBIR) algorithm. This leads to a new image retrieval strategy, where a suitable CBIR algorithm is selected based on image characteristics of foreground objects and backgrounds.
Chapter 2 extends the work to video analysis. Two multi-resolution video representation schemes, based on the Kernel-based Principal Component Analysis (KPCA), and the Mean Shift Analysis (MSA) are proposed. Both can lead to efficient multi-resolution video representation by simply tuning the internal parameters. With low computation cost, motion vectors are extracted from MPEG video stream and then processed. This approach is especially suitable for Skycam-based application where the image resolution is usually low.
Chapter 3 treats the model selection issue of kernel methods. A unified model selection for both bi-class and multi-class Support Vector Machines (SVMs) is proposed, based on the gradient descent method and conceptually simple and easy to implement. The criterion is then extended to the Kernel-based Linear Discriminant Analysis (KLDA). A generalized radius-margin bound is developed for multi-class SVMs to efficiently perform both model selection and feature selection.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
Chapter 4 discusses the video repeat identification and video structure analysis. Effective solutions to identify short video repeats from video collections or streams are developed for the purpose of video structure analysis, important event mining, and commercial detection and skipping.
Chapter 5 is focused on the image segmentation based on the disjoint set union. A new watershed algorithm is proposed to address issues like the over-segmentation and the memory overflow in some existing methods.
Chapter 6 explores the characteristics of the human visual system (HVS) and proposed an improved scheme for estimating just-noticeable distortion (JND). In general, any information below JND can be ignored. Applying the JND gauge to image and video compression can lead to the improvement on coding efficiency as bits can be located according to the JND thresholds in different areas. In other words, the improvement on the perceptual quality of coded images and videos can be observed as compared with traditional coding methods. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Xue, Ping. |
format |
Research Report |
author |
Xue, Ping. |
author_sort |
Xue, Ping. |
title |
Perceptual and content based analysis for coding multimedia information |
title_short |
Perceptual and content based analysis for coding multimedia information |
title_full |
Perceptual and content based analysis for coding multimedia information |
title_fullStr |
Perceptual and content based analysis for coding multimedia information |
title_full_unstemmed |
Perceptual and content based analysis for coding multimedia information |
title_sort |
perceptual and content based analysis for coding multimedia information |
publishDate |
2008 |
url |
http://hdl.handle.net/10356/14519 |
_version_ |
1759853803703631872 |