LLM-based column lineage for relational databases

This research paper explores a novel approach to deriving column lineage of relational databases, by making use of large language models (LLMs). Column lineage, or column-level lineage, tracks the flow of data for each column across tables, from ingestion to visualization. Traditional methods for de...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Yu Ling
Other Authors: Long Cheng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
LLM
Online Access:https://hdl.handle.net/10356/181146
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This research paper explores a novel approach to deriving column lineage of relational databases, by making use of large language models (LLMs). Column lineage, or column-level lineage, tracks the flow of data for each column across tables, from ingestion to visualization. Traditional methods for determining column lineage rely heavily on SQL parsers, which are often rigid and inflexible. Consequently, existing tools for column lineage are difficult to generalize and expensive to maintain. This project seeks to overcome these limitations by investigating the potential of using LLMs as an alternative to conventional SQL parsers.