LLM-based column lineage for relational databases

This research paper explores a novel approach to deriving column lineage of relational databases, by making use of large language models (LLMs). Column lineage, or column-level lineage, tracks the flow of data for each column across tables, from ingestion to visualization. Traditional methods for de...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Yu Ling
Other Authors: Long Cheng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
LLM
Online Access:https://hdl.handle.net/10356/181146
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181146
record_format dspace
spelling sg-ntu-dr.10356-1811462024-11-18T00:19:23Z LLM-based column lineage for relational databases Tan, Yu Ling Long Cheng College of Computing and Data Science c.long@ntu.edu.sg Computer and Information Science LLM Database Column lineage This research paper explores a novel approach to deriving column lineage of relational databases, by making use of large language models (LLMs). Column lineage, or column-level lineage, tracks the flow of data for each column across tables, from ingestion to visualization. Traditional methods for determining column lineage rely heavily on SQL parsers, which are often rigid and inflexible. Consequently, existing tools for column lineage are difficult to generalize and expensive to maintain. This project seeks to overcome these limitations by investigating the potential of using LLMs as an alternative to conventional SQL parsers. Bachelor's degree 2024-11-18T00:19:23Z 2024-11-18T00:19:23Z 2024 Final Year Project (FYP) Tan, Y. L. (2024). LLM-based column lineage for relational databases. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181146 https://hdl.handle.net/10356/181146 en SCSE23-0652 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
LLM
Database
Column lineage
spellingShingle Computer and Information Science
LLM
Database
Column lineage
Tan, Yu Ling
LLM-based column lineage for relational databases
description This research paper explores a novel approach to deriving column lineage of relational databases, by making use of large language models (LLMs). Column lineage, or column-level lineage, tracks the flow of data for each column across tables, from ingestion to visualization. Traditional methods for determining column lineage rely heavily on SQL parsers, which are often rigid and inflexible. Consequently, existing tools for column lineage are difficult to generalize and expensive to maintain. This project seeks to overcome these limitations by investigating the potential of using LLMs as an alternative to conventional SQL parsers.
author2 Long Cheng
author_facet Long Cheng
Tan, Yu Ling
format Final Year Project
author Tan, Yu Ling
author_sort Tan, Yu Ling
title LLM-based column lineage for relational databases
title_short LLM-based column lineage for relational databases
title_full LLM-based column lineage for relational databases
title_fullStr LLM-based column lineage for relational databases
title_full_unstemmed LLM-based column lineage for relational databases
title_sort llm-based column lineage for relational databases
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/181146
_version_ 1816859055751168000