Advancements in green AI: a pathway to sustainability
In this paper, a survey of model compression and optimization techniques are evaluated on benchmarks of energy efficiency, memory footprint and accuracy on a task key to online safety, phishing. The three primary categories of compression explored are (1) Quantization, (2) Distillation and (3) Pruni...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181771 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-181771 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1817712024-12-18T11:47:29Z Advancements in green AI: a pathway to sustainability Palanca Sebastian Gonzalo Miguel IV Puyat Dusit Niyato College of Computing and Data Science Ong Yoon Sew DNIYATO@ntu.edu.sg Computer and Information Science Artificial intelligence Model compression In this paper, a survey of model compression and optimization techniques are evaluated on benchmarks of energy efficiency, memory footprint and accuracy on a task key to online safety, phishing. The three primary categories of compression explored are (1) Quantization, (2) Distillation and (3) Pruning. The quantization techniques explored are QLoRA and LLM.Int8, techniques designed for compressing LLMs as well as Quantization Aware Training with asymmetric quantization on inference. The Distillation techniques explored are (1) Knowledge Distillation, (2) Hint Distillation for FitNets and (3) Relational Knowledge Distillation, all of which are used to train smaller transformer architectures compared to the base Bert transformer. For Pruning, L1 and L2 Magnitude Pruning and Head Pruning are evaluated. The results showed major gains in both carbon footprint and memory footprint are made with the application of QLoRA with FP4 and a compute type of FP16, with near zero accuracy degradation. The model showed great promise with an accuracy of 98.60%, a carbon footprint of 0.0016kg of CO2 for 20,000 samples, and time per inference of 0.0059 seconds, making it fast, efficient and of high quality, especially when compared to a baseline performance of 98.58%, 0.0095kg of CO2 for 20,000 samples, and a time per inference of 0.016, making the most optimal model 10 times faster and has nearly 6 times less carbon emissions over 20,000 samples. Bachelor's degree 2024-12-18T11:47:29Z 2024-12-18T11:47:29Z 2024 Final Year Project (FYP) Palanca Sebastian Gonzalo Miguel IV Puyat (2024). Advancements in green AI: a pathway to sustainability. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181771 https://hdl.handle.net/10356/181771 en SCSE23-0820 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Artificial intelligence Model compression |
spellingShingle |
Computer and Information Science Artificial intelligence Model compression Palanca Sebastian Gonzalo Miguel IV Puyat Advancements in green AI: a pathway to sustainability |
description |
In this paper, a survey of model compression and optimization techniques are evaluated on benchmarks of energy efficiency, memory footprint and accuracy on a task key to online safety, phishing. The three primary categories of compression explored are (1) Quantization, (2) Distillation and (3) Pruning. The quantization techniques explored are QLoRA and LLM.Int8, techniques designed for compressing LLMs as well as Quantization Aware Training with asymmetric quantization on inference. The Distillation techniques explored are (1) Knowledge Distillation, (2) Hint Distillation for FitNets and (3) Relational Knowledge Distillation, all of which are used to train smaller transformer architectures compared to the base Bert transformer. For Pruning, L1 and L2 Magnitude Pruning and Head Pruning are evaluated. The results showed major gains in both carbon footprint and memory footprint are made with the application of QLoRA with FP4 and a compute type of FP16, with near zero accuracy degradation. The model showed great promise with an accuracy of 98.60%, a carbon footprint of 0.0016kg of CO2 for 20,000 samples, and time per inference of 0.0059 seconds, making it fast, efficient and of high quality, especially when compared to a baseline performance of 98.58%, 0.0095kg of CO2 for 20,000 samples, and a time per inference of 0.016, making the most optimal model 10 times faster and has nearly 6 times less carbon emissions over 20,000 samples. |
author2 |
Dusit Niyato |
author_facet |
Dusit Niyato Palanca Sebastian Gonzalo Miguel IV Puyat |
format |
Final Year Project |
author |
Palanca Sebastian Gonzalo Miguel IV Puyat |
author_sort |
Palanca Sebastian Gonzalo Miguel IV Puyat |
title |
Advancements in green AI: a pathway to sustainability |
title_short |
Advancements in green AI: a pathway to sustainability |
title_full |
Advancements in green AI: a pathway to sustainability |
title_fullStr |
Advancements in green AI: a pathway to sustainability |
title_full_unstemmed |
Advancements in green AI: a pathway to sustainability |
title_sort |
advancements in green ai: a pathway to sustainability |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/181771 |
_version_ |
1819113078636150784 |