Instruction-guided image editing empowered by large language models
This final year project is mainly focused on developing a compositional framework which enables an user to edit user-provided photos using natural language instructions. Theproposedapproachavoidsaresource-demandingtraining process by leveraging the impressive reasoning ability of large languag...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175157 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-175157 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1751572024-04-26T15:41:12Z Instruction-guided image editing empowered by large language models Wang, Yiying Hanwang Zhang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Computer and Information Science This final year project is mainly focused on developing a compositional framework which enables an user to edit user-provided photos using natural language instructions. Theproposedapproachavoidsaresource-demandingtraining process by leveraging the impressive reasoning ability of large language models (LLM) as well as off-the-shelf visual models which have demonstrated remarkable zero-shot performance in diverse scenarios. Meanwhile, as the framework is highly modularized, the functionalities of the framework are expected to be further extended in the future along with the advancement of cutting-edge computer vision models. The experiment results have proven that the framework is able to produce delightful outcome. Furthermore, a web demo is created for providing a straightforward and user-friendly graphical interface, enhancing the framework’s interactivity. Bachelor's degree 2024-04-22T06:40:34Z 2024-04-22T06:40:34Z 2024 Final Year Project (FYP) Wang, Y. (2024). Instruction-guided image editing empowered by large language models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175157 https://hdl.handle.net/10356/175157 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science |
spellingShingle |
Computer and Information Science Wang, Yiying Instruction-guided image editing empowered by large language models |
description |
This final year project is mainly focused on developing a compositional framework
which enables an user to edit user-provided photos using natural language instructions.
Theproposedapproachavoidsaresource-demandingtraining process by leveraging the
impressive reasoning ability of large language models (LLM) as well as off-the-shelf
visual models which have demonstrated remarkable zero-shot performance in diverse
scenarios. Meanwhile, as the framework is highly modularized, the functionalities
of the framework are expected to be further extended in the future along with the
advancement of cutting-edge computer vision models. The experiment results have
proven that the framework is able to produce delightful outcome. Furthermore, a web
demo is created for providing a straightforward and user-friendly graphical interface,
enhancing the framework’s interactivity. |
author2 |
Hanwang Zhang |
author_facet |
Hanwang Zhang Wang, Yiying |
format |
Final Year Project |
author |
Wang, Yiying |
author_sort |
Wang, Yiying |
title |
Instruction-guided image editing empowered by large language models |
title_short |
Instruction-guided image editing empowered by large language models |
title_full |
Instruction-guided image editing empowered by large language models |
title_fullStr |
Instruction-guided image editing empowered by large language models |
title_full_unstemmed |
Instruction-guided image editing empowered by large language models |
title_sort |
instruction-guided image editing empowered by large language models |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/175157 |
_version_ |
1800916340724203520 |