OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation

In the realm of food computing, segmenting ingredients from images poses substantial challenges due to the large intra-class variance among the same ingredients, the emergence of new ingredients, and the high annotation costs as-sociated with large food segmentation datasets. Existing approaches pri...

Full description

Saved in:

Bibliographic Details
Main Authors:	WU, Xiongwei, YU, Sicheng, LIM, Ee-Peng, NGO, Chong-wah
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Food image segmentation Text embeddings Vision language model Image segmentation Visualization Computer vision Adaptation models Machine learning Artificial Intelligence and Robotics Computer Sciences
Online Access:	https://ink.library.smu.edu.sg/sis_research/9861
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10861
record_format	dspace
spelling	sg-smu-ink.sis_research-108612024-12-24T02:24:02Z OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation WU, Xiongwei YU, Sicheng LIM, Ee-Peng NGO, Chong-wah In the realm of food computing, segmenting ingredients from images poses substantial challenges due to the large intra-class variance among the same ingredients, the emergence of new ingredients, and the high annotation costs as-sociated with large food segmentation datasets. Existing approaches primarily utilize a closed-vocabulary and static text embeddings setting. These methods often fall short in effectively handling the ingredients, particularly new and diverse ones. In response to these limitations, we introduce OVFoodSeg, a framework that adopts an open-vocabulary setting and enhances text embeddings with visual context. By integrating vision-language models (VLMs), our approach enriches text embedding with image-specific infor-mation through two innovative modules, e.g., an image-to-text learner FoodLearner and an Image-Informed Text Encoder. The training process of OVFoodSeg is divided into two stages: the pre-training of FoodLearner and the sub-sequent learning phase for segmentation. The pre-training phase equips FoodLearner with the capability to align visual information with corresponding textual representations that are specifically related to food, while the second phase adapts both the FoodLearner and the Image-Informed Text Encoder for the segmentation task. By addressing the de-ficiencies of previous models, OVFoodSeg demonstrates a significant improvement, achieving an 4.9% increase in mean Intersection over Union (mIoU) on the FoodSeg103 dataset, setting a new milestone for food image segmentation. 2024-01-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/9861 info:doi/10.1109/CVPR52733.2024.00397 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Food image segmentation Text embeddings Vision language model Image segmentation Visualization Computer vision Adaptation models Machine learning Artificial Intelligence and Robotics Computer Sciences
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Food image segmentation Text embeddings Vision language model Image segmentation Visualization Computer vision Adaptation models Machine learning Artificial Intelligence and Robotics Computer Sciences
spellingShingle	Food image segmentation Text embeddings Vision language model Image segmentation Visualization Computer vision Adaptation models Machine learning Artificial Intelligence and Robotics Computer Sciences WU, Xiongwei YU, Sicheng LIM, Ee-Peng NGO, Chong-wah OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation
description	In the realm of food computing, segmenting ingredients from images poses substantial challenges due to the large intra-class variance among the same ingredients, the emergence of new ingredients, and the high annotation costs as-sociated with large food segmentation datasets. Existing approaches primarily utilize a closed-vocabulary and static text embeddings setting. These methods often fall short in effectively handling the ingredients, particularly new and diverse ones. In response to these limitations, we introduce OVFoodSeg, a framework that adopts an open-vocabulary setting and enhances text embeddings with visual context. By integrating vision-language models (VLMs), our approach enriches text embedding with image-specific infor-mation through two innovative modules, e.g., an image-to-text learner FoodLearner and an Image-Informed Text Encoder. The training process of OVFoodSeg is divided into two stages: the pre-training of FoodLearner and the sub-sequent learning phase for segmentation. The pre-training phase equips FoodLearner with the capability to align visual information with corresponding textual representations that are specifically related to food, while the second phase adapts both the FoodLearner and the Image-Informed Text Encoder for the segmentation task. By addressing the de-ficiencies of previous models, OVFoodSeg demonstrates a significant improvement, achieving an 4.9% increase in mean Intersection over Union (mIoU) on the FoodSeg103 dataset, setting a new milestone for food image segmentation.
format	text
author	WU, Xiongwei YU, Sicheng LIM, Ee-Peng NGO, Chong-wah
author_facet	WU, Xiongwei YU, Sicheng LIM, Ee-Peng NGO, Chong-wah
author_sort	WU, Xiongwei
title	OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation
title_short	OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation
title_full	OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation
title_fullStr	OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation
title_full_unstemmed	OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation
title_sort	ovfoodseg : elevating open-vocabulary food image segmentation via image-informed textual representation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9861
_version_	1821237254381633536

OVFoodSeg : Elevating open-vocabulary food image segmentation via image-informed textual representation

Similar Items