Advancements in green AI: a pathway to sustainability

In this paper, a survey of model compression and optimization techniques are evaluated on benchmarks of energy efficiency, memory footprint and accuracy on a task key to online safety, phishing. The three primary categories of compression explored are (1) Quantization, (2) Distillation and (3) Pruni...

Full description

Saved in:
Bibliographic Details
Main Author: Palanca Sebastian Gonzalo Miguel IV Puyat
Other Authors: Dusit Niyato
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181771
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In this paper, a survey of model compression and optimization techniques are evaluated on benchmarks of energy efficiency, memory footprint and accuracy on a task key to online safety, phishing. The three primary categories of compression explored are (1) Quantization, (2) Distillation and (3) Pruning. The quantization techniques explored are QLoRA and LLM.Int8, techniques designed for compressing LLMs as well as Quantization Aware Training with asymmetric quantization on inference. The Distillation techniques explored are (1) Knowledge Distillation, (2) Hint Distillation for FitNets and (3) Relational Knowledge Distillation, all of which are used to train smaller transformer architectures compared to the base Bert transformer. For Pruning, L1 and L2 Magnitude Pruning and Head Pruning are evaluated. The results showed major gains in both carbon footprint and memory footprint are made with the application of QLoRA with FP4 and a compute type of FP16, with near zero accuracy degradation. The model showed great promise with an accuracy of 98.60%, a carbon footprint of 0.0016kg of CO2 for 20,000 samples, and time per inference of 0.0059 seconds, making it fast, efficient and of high quality, especially when compared to a baseline performance of 98.58%, 0.0095kg of CO2 for 20,000 samples, and a time per inference of 0.016, making the most optimal model 10 times faster and has nearly 6 times less carbon emissions over 20,000 samples.