Instruction-guided image editing empowered by large language models
This final year project is mainly focused on developing a compositional framework which enables an user to edit user-provided photos using natural language instructions. Theproposedapproachavoidsaresource-demandingtraining process by leveraging the impressive reasoning ability of large languag...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175157 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This final year project is mainly focused on developing a compositional framework
which enables an user to edit user-provided photos using natural language instructions.
Theproposedapproachavoidsaresource-demandingtraining process by leveraging the
impressive reasoning ability of large language models (LLM) as well as off-the-shelf
visual models which have demonstrated remarkable zero-shot performance in diverse
scenarios. Meanwhile, as the framework is highly modularized, the functionalities
of the framework are expected to be further extended in the future along with the
advancement of cutting-edge computer vision models. The experiment results have
proven that the framework is able to produce delightful outcome. Furthermore, a web
demo is created for providing a straightforward and user-friendly graphical interface,
enhancing the framework’s interactivity. |
---|