InceptionNeXt: When Inception meets ConvNeXt

Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7×7 depthwise convolution. Although such depthwise operator only consum...

Full description

Saved in:
Bibliographic Details
Main Authors: YU, Weihao, ZHOU, Pan, YAN, Shuicheng, WANG, Xinchao
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8981
https://ink.library.smu.edu.sg/context/sis_research/article/9984/viewcontent/2024_CVPR_InceptionNext.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9984
record_format dspace
spelling sg-smu-ink.sis_research-99842024-07-25T08:31:55Z InceptionNeXt: When Inception meets ConvNeXt YU, Weihao ZHOU, Pan YAN, Shuicheng WANG, Xinchao Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7×7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXtT has similar FLOPs with ResNet-50 but only achieves ∼ 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation, which poses a challenging problem: How to speed up large-kernel-based CNN models while preserving their performance. To tackle this issue, inspired by Inceptions, we propose to decompose large-kernel depthwise convolution into four parallel branches along channel dimension, i.e., small square kernel, two orthogonal band kernels, and an identity mapping. With this new Inception depthwise convolution, we build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. For instance, InceptionNeXt-T achieves 1.6× higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can serve as an economical baseline for future architecture design to reduce carbon footprint 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8981 https://ink.library.smu.edu.sg/context/sis_research/article/9984/viewcontent/2024_CVPR_InceptionNext.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Graphics and Human Computer Interfaces
spellingShingle Graphics and Human Computer Interfaces
YU, Weihao
ZHOU, Pan
YAN, Shuicheng
WANG, Xinchao
InceptionNeXt: When Inception meets ConvNeXt
description Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7×7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXtT has similar FLOPs with ResNet-50 but only achieves ∼ 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation, which poses a challenging problem: How to speed up large-kernel-based CNN models while preserving their performance. To tackle this issue, inspired by Inceptions, we propose to decompose large-kernel depthwise convolution into four parallel branches along channel dimension, i.e., small square kernel, two orthogonal band kernels, and an identity mapping. With this new Inception depthwise convolution, we build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. For instance, InceptionNeXt-T achieves 1.6× higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can serve as an economical baseline for future architecture design to reduce carbon footprint
format text
author YU, Weihao
ZHOU, Pan
YAN, Shuicheng
WANG, Xinchao
author_facet YU, Weihao
ZHOU, Pan
YAN, Shuicheng
WANG, Xinchao
author_sort YU, Weihao
title InceptionNeXt: When Inception meets ConvNeXt
title_short InceptionNeXt: When Inception meets ConvNeXt
title_full InceptionNeXt: When Inception meets ConvNeXt
title_fullStr InceptionNeXt: When Inception meets ConvNeXt
title_full_unstemmed InceptionNeXt: When Inception meets ConvNeXt
title_sort inceptionnext: when inception meets convnext
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/8981
https://ink.library.smu.edu.sg/context/sis_research/article/9984/viewcontent/2024_CVPR_InceptionNext.pdf
_version_ 1814047699740131328