Instruction-guided image editing empowered by large language models

This final year project is mainly focused on developing a compositional framework which enables an user to edit user-provided photos using natural language instructions. Theproposedapproachavoidsaresource-demandingtraining process by leveraging the impressive reasoning ability of large languag...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yiying
Other Authors: Hanwang Zhang
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175157
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175157
record_format dspace
spelling sg-ntu-dr.10356-1751572024-04-26T15:41:12Z Instruction-guided image editing empowered by large language models Wang, Yiying Hanwang Zhang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Computer and Information Science This final year project is mainly focused on developing a compositional framework which enables an user to edit user-provided photos using natural language instructions. Theproposedapproachavoidsaresource-demandingtraining process by leveraging the impressive reasoning ability of large language models (LLM) as well as off-the-shelf visual models which have demonstrated remarkable zero-shot performance in diverse scenarios. Meanwhile, as the framework is highly modularized, the functionalities of the framework are expected to be further extended in the future along with the advancement of cutting-edge computer vision models. The experiment results have proven that the framework is able to produce delightful outcome. Furthermore, a web demo is created for providing a straightforward and user-friendly graphical interface, enhancing the framework’s interactivity. Bachelor's degree 2024-04-22T06:40:34Z 2024-04-22T06:40:34Z 2024 Final Year Project (FYP) Wang, Y. (2024). Instruction-guided image editing empowered by large language models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175157 https://hdl.handle.net/10356/175157 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Wang, Yiying
Instruction-guided image editing empowered by large language models
description This final year project is mainly focused on developing a compositional framework which enables an user to edit user-provided photos using natural language instructions. Theproposedapproachavoidsaresource-demandingtraining process by leveraging the impressive reasoning ability of large language models (LLM) as well as off-the-shelf visual models which have demonstrated remarkable zero-shot performance in diverse scenarios. Meanwhile, as the framework is highly modularized, the functionalities of the framework are expected to be further extended in the future along with the advancement of cutting-edge computer vision models. The experiment results have proven that the framework is able to produce delightful outcome. Furthermore, a web demo is created for providing a straightforward and user-friendly graphical interface, enhancing the framework’s interactivity.
author2 Hanwang Zhang
author_facet Hanwang Zhang
Wang, Yiying
format Final Year Project
author Wang, Yiying
author_sort Wang, Yiying
title Instruction-guided image editing empowered by large language models
title_short Instruction-guided image editing empowered by large language models
title_full Instruction-guided image editing empowered by large language models
title_fullStr Instruction-guided image editing empowered by large language models
title_full_unstemmed Instruction-guided image editing empowered by large language models
title_sort instruction-guided image editing empowered by large language models
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/175157
_version_ 1800916340724203520