AptMamba: enhancing vision Mamba efficiency by adaptively pruning tokens
Mamba has recently emerged as a promising architecture for handling long se- quences in vision tasks. However, research on pruning methods specifically for vision Mamba remains unexplored. To address this gap, we aim to introduce to- ken pruning for vision Mamba to enhance efficiency, which faces...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182039 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-182039 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1820392025-01-10T15:48:40Z AptMamba: enhancing vision Mamba efficiency by adaptively pruning tokens Song, Shangye Jiang Xudong School of Electrical and Electronic Engineering EXDJiang@ntu.edu.sg Computer and Information Science Deep learning Mamba has recently emerged as a promising architecture for handling long se- quences in vision tasks. However, research on pruning methods specifically for vision Mamba remains unexplored. To address this gap, we aim to introduce to- ken pruning for vision Mamba to enhance efficiency, which faces several chal- lenges. First, without an attention mechanism, selecting tokens for pruning is less straightforward compared with vision transformers. Second, Mamba’s se- quential processing requires re-adjusting the token sequence after pruning to maintain calculation validity. To address these challenges, we propose to Adap- tively Prune Tokens for Mamba (AptMamba). Specifically, we design an Adap- tive Token Prediction (ATP) module to adaptively assess token importance, gen- erating binary decisions through Gumbel-Softmax. Pruned tokens are set to zero to minimize interference, and a multi-stage, progressive pruning strategy is ap- plied. To account for Mamba’s sequential scanning, we reorder tokens after pruning. Moreover, we present an auxiliary Pruning Reconstruction Decoder (PR-Decoder) during training to adaptively reconstruct pruned tokens, enhancing the representation capacity of the pruned Mamba without introducing extra in- ference costs. Additionally, we develop a Multi-Stage Pruning Loss (MSP-Loss) to adaptively confirm that pruned tokens contain minimal information, reducing the impact of token pruning. Extensive experiments show that our method can reduce the amount of computation by about 11.8% - 37.5% with 66% progres- sive token pruning, while maintaining accuracy impact within 1.1% - 3.5%. Master's degree 2025-01-06T05:09:24Z 2025-01-06T05:09:24Z 2024 Thesis-Master by Coursework Song, S. (2024). AptMamba: enhancing vision Mamba efficiency by adaptively pruning tokens. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182039 https://hdl.handle.net/10356/182039 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Deep learning |
spellingShingle |
Computer and Information Science Deep learning Song, Shangye AptMamba: enhancing vision Mamba efficiency by adaptively pruning tokens |
description |
Mamba has recently emerged as a promising architecture for handling long se-
quences in vision tasks. However, research on pruning methods specifically for
vision Mamba remains unexplored. To address this gap, we aim to introduce to-
ken pruning for vision Mamba to enhance efficiency, which faces several chal-
lenges. First, without an attention mechanism, selecting tokens for pruning is
less straightforward compared with vision transformers. Second, Mamba’s se-
quential processing requires re-adjusting the token sequence after pruning to
maintain calculation validity. To address these challenges, we propose to Adap-
tively Prune Tokens for Mamba (AptMamba). Specifically, we design an Adap-
tive Token Prediction (ATP) module to adaptively assess token importance, gen-
erating binary decisions through Gumbel-Softmax. Pruned tokens are set to zero
to minimize interference, and a multi-stage, progressive pruning strategy is ap-
plied. To account for Mamba’s sequential scanning, we reorder tokens after
pruning. Moreover, we present an auxiliary Pruning Reconstruction Decoder
(PR-Decoder) during training to adaptively reconstruct pruned tokens, enhancing
the representation capacity of the pruned Mamba without introducing extra in-
ference costs. Additionally, we develop a Multi-Stage Pruning Loss (MSP-Loss)
to adaptively confirm that pruned tokens contain minimal information, reducing
the impact of token pruning. Extensive experiments show that our method can
reduce the amount of computation by about 11.8% - 37.5% with 66% progres-
sive token pruning, while maintaining accuracy impact within 1.1% - 3.5%. |
author2 |
Jiang Xudong |
author_facet |
Jiang Xudong Song, Shangye |
format |
Thesis-Master by Coursework |
author |
Song, Shangye |
author_sort |
Song, Shangye |
title |
AptMamba: enhancing vision Mamba efficiency by adaptively pruning tokens |
title_short |
AptMamba: enhancing vision Mamba efficiency by adaptively pruning tokens |
title_full |
AptMamba: enhancing vision Mamba efficiency by adaptively pruning tokens |
title_fullStr |
AptMamba: enhancing vision Mamba efficiency by adaptively pruning tokens |
title_full_unstemmed |
AptMamba: enhancing vision Mamba efficiency by adaptively pruning tokens |
title_sort |
aptmamba: enhancing vision mamba efficiency by adaptively pruning tokens |
publisher |
Nanyang Technological University |
publishDate |
2025 |
url |
https://hdl.handle.net/10356/182039 |
_version_ |
1821237151532056576 |