Instruction-guided image editing empowered by large language models

This final year project is mainly focused on developing a compositional framework which enables an user to edit user-provided photos using natural language instructions. Theproposedapproachavoidsaresource-demandingtraining process by leveraging the impressive reasoning ability of large languag...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yiying
Other Authors: Hanwang Zhang
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175157
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This final year project is mainly focused on developing a compositional framework which enables an user to edit user-provided photos using natural language instructions. Theproposedapproachavoidsaresource-demandingtraining process by leveraging the impressive reasoning ability of large language models (LLM) as well as off-the-shelf visual models which have demonstrated remarkable zero-shot performance in diverse scenarios. Meanwhile, as the framework is highly modularized, the functionalities of the framework are expected to be further extended in the future along with the advancement of cutting-edge computer vision models. The experiment results have proven that the framework is able to produce delightful outcome. Furthermore, a web demo is created for providing a straightforward and user-friendly graphical interface, enhancing the framework’s interactivity.