Query cost estimation in DBMS with deep learning

Cost and cardinality estimation is considered the Achilles Heel of modern query optimizers. Poor cardinality estimates lead to bad cost estimates resulting in sub-optimal query execution plans being selected which drops the performance of query optimizers. With the recent rise of ML for DB, the d...

全面介紹

Saved in:
書目詳細資料
主要作者: Acharya, Atul
其他作者: Luo Siqiang
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2023
主題:
在線閱讀:https://hdl.handle.net/10356/166095
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:Cost and cardinality estimation is considered the Achilles Heel of modern query optimizers. Poor cardinality estimates lead to bad cost estimates resulting in sub-optimal query execution plans being selected which drops the performance of query optimizers. With the recent rise of ML for DB, the database community explored the use of learned methods in cost and cardinality estimation. However none of the methods till date can achieve prediction speeds required for modern database systems. In this project we introduce a novel algorithm (TreeGBM) using Gradient Boosting Trees to solve both cost estimation and cardinality estimation on numeric JOB workloads based on the IMDB dataset. We conducted multiple experiments to improve prediction scores and inference times. Our experiments showed that the TreeGBM was ∼120 times faster than state-of-the-art learned methods while maintaining good prediction scores. We stated possible improvements to our method that could help improve prediction scores and inference times. Future work can add on to the algorithm by using a new predicate embedding algorithm that does not incur much latency and by using prefix tries to encode string values.