InceptionNeXt: When Inception meets ConvNeXt
Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7×7 depthwise convolution. Although such depthwise operator only consum...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8981 https://ink.library.smu.edu.sg/context/sis_research/article/9984/viewcontent/2024_CVPR_InceptionNext.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9984 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-99842024-07-25T08:31:55Z InceptionNeXt: When Inception meets ConvNeXt YU, Weihao ZHOU, Pan YAN, Shuicheng WANG, Xinchao Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7×7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXtT has similar FLOPs with ResNet-50 but only achieves ∼ 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation, which poses a challenging problem: How to speed up large-kernel-based CNN models while preserving their performance. To tackle this issue, inspired by Inceptions, we propose to decompose large-kernel depthwise convolution into four parallel branches along channel dimension, i.e., small square kernel, two orthogonal band kernels, and an identity mapping. With this new Inception depthwise convolution, we build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. For instance, InceptionNeXt-T achieves 1.6× higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can serve as an economical baseline for future architecture design to reduce carbon footprint 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8981 https://ink.library.smu.edu.sg/context/sis_research/article/9984/viewcontent/2024_CVPR_InceptionNext.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graphics and Human Computer Interfaces |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Graphics and Human Computer Interfaces |
spellingShingle |
Graphics and Human Computer Interfaces YU, Weihao ZHOU, Pan YAN, Shuicheng WANG, Xinchao InceptionNeXt: When Inception meets ConvNeXt |
description |
Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7×7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXtT has similar FLOPs with ResNet-50 but only achieves ∼ 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation, which poses a challenging problem: How to speed up large-kernel-based CNN models while preserving their performance. To tackle this issue, inspired by Inceptions, we propose to decompose large-kernel depthwise convolution into four parallel branches along channel dimension, i.e., small square kernel, two orthogonal band kernels, and an identity mapping. With this new Inception depthwise convolution, we build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. For instance, InceptionNeXt-T achieves 1.6× higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can serve as an economical baseline for future architecture design to reduce carbon footprint |
format |
text |
author |
YU, Weihao ZHOU, Pan YAN, Shuicheng WANG, Xinchao |
author_facet |
YU, Weihao ZHOU, Pan YAN, Shuicheng WANG, Xinchao |
author_sort |
YU, Weihao |
title |
InceptionNeXt: When Inception meets ConvNeXt |
title_short |
InceptionNeXt: When Inception meets ConvNeXt |
title_full |
InceptionNeXt: When Inception meets ConvNeXt |
title_fullStr |
InceptionNeXt: When Inception meets ConvNeXt |
title_full_unstemmed |
InceptionNeXt: When Inception meets ConvNeXt |
title_sort |
inceptionnext: when inception meets convnext |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/sis_research/8981 https://ink.library.smu.edu.sg/context/sis_research/article/9984/viewcontent/2024_CVPR_InceptionNext.pdf |
_version_ |
1814047699740131328 |