4-bit shampoo for memory-efficient network training

4-bit shampoo for memory-efficient network training

Second-order optimizers, maintaining a matrix termed a preconditioner, are superior to first-order optimizers in both theory and practice. The states forming the preconditioner and its inverse root restrict the maximum size of models trained by second-order optimizers. To address this, compressing 3...

Saved in:

書目詳細資料
Main Authors:	WANG, Sike, ZHOU, Pan, LI, Jia, HUANG, Hua
格式:	text
語言:	English
出版:	Institutional Knowledge at Singapore Management University 2024
主題:	Optimizers Preconditioner Memory efficiency OS and Networks
在線閱讀:	https://ink.library.smu.edu.sg/sis_research/9731 https://ink.library.smu.edu.sg/context/sis_research/article/10731/viewcontent/4_bit.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

相似書籍

ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection
由: HUANG, Zhongzhan, et al.
出版: (2023)

Sequential recommendation with user memory networks
由: CHEN, Xu, et al.
出版: (2018)

Quantization-aware interval bound propagation for training certifiably robust quantized neural networks
由: LECHNER, Mathias, et al.
出版: (2023)

Win: Weight-decay-integrated Nesterov acceleration for faster network training
由: ZHOU, Pan, et al.
出版: (2024)

MemLock: Memory usage guided fuzzing
由: WEN, Cheng, et al.
出版: (2020)

Efficient Data Compression with Error Bound Guarantee in Wireless Sensor Networks
由: Mohammad Abu Alsheikh,, et al.
出版: (2014)

Analysis, Design And Optimization Of Energy Efficient Protocols For Wireless Sensor Networks
由: HOANG DUC CHINH
出版: (2013)

Behind the magical numbers: Hierarchical chunking and the human working memory capacity
由: LI, Guoqi, et al.
出版: (2013)

Efficient stochastic gradient hard thresholding
由: ZHOU, Pan, et al.
出版: (2018)

Semantic memory modeling and memory interaction in learning agents
由: WANG, Wenwen, et al.
出版: (2016)

Two problems in convex conic optimization
由: ZHAI XIAOJUN
出版: (2010)

Self-organizing neural networks for universal learning and multimodal memory encoding
由: TAN, Ah-hwee, et al.
出版: (2019)

New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
由: ZHOU, Pan, et al.
出版: (2018)

Memory dynamics in attractor networks
由: LI, Guoqi, et al.
出版: (2015)

Empirical risk landscape analysis for understanding deep neural networks
由: ZHOU, Pan, et al.
出版: (2018)

Dynamic student classification on memory networks for knowledge tracing
由: MINN, Sein, et al.
出版: (2019)

Pre-training graph neural networks for link prediction in biomedical networks
由: LONG, Yahui, et al.
出版: (2022)

A neural network model for a hierarchical spatio-temporal memory
由: RAMANATHAN, Kiruthika, et al.
出版: (2008)

Verification of bit-flip attacks against quantized neural networks
由: ZHANG, Yedi, et al.
出版: (2025)

Network Effects and Embedded Options: Decision-Making under Uncertainty for Network Technology Investments.
由: KAUFFMAN, Robert John, et al.
出版: (2008)

Interaction-aware arrangement for event-based social networks
由: KOU, Feifei, et al.
出版: (2019)

Network Structure of Social Coding in GitHub
由: Thung, Ferdian, et al.
出版: (2013)

Preconditioners for iterative solutions of large-scale linear systems arising from Biot's consolidation equations
由: CHEN XI
出版: (2010)

Infinite time horizon safety of Bayesian neural networks
由: LECHNER, Mathias, et al.
出版: (2021)

Social balance on networks: Local minima and best-edge dynamics
由: CHATTERJEE, Krishnendu, et al.
出版: (2024)

Stability verification in stochastic control systems via neural network supermartingales
由: LECHNER, Mathias, et al.
出版: (2024)

Towards omni-generalizable neural methods for vehicle routing problems
由: ZHOU, Jianan, et al.
出版: (2023)

Pruning meta-trained networks for on-device adaptation
由: GAO, Dawei, et al.
出版: (2021)

Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives
由: TAY, Yi, et al.
出版: (2019)

Theory-inspired path-regularized differential network architecture search
由: ZHOU, Pan, et al.
出版: (2020)

A self-organizing approach to episodic memory modeling
由: WANG, Wenwen, et al.
出版: (2010)

ShellNet: Efficient point cloud convolutional neural networks using concentric shells statistics
由: ZHANG, Zhiyuan, et al.
出版: (2019)

Threshold-based incomplete LU factorization preconditioner for adaptive integral method
由: Zhang, M., et al.
出版: (2014)

The throughput of irreducible closed Markovian queueing networks: Functional bounds, asymptotic loss, efficiency, and the Harrison-Wein conjectures
由: Jin, H., et al.
出版: (2014)

Lightweight and efficient neural natural language processing with quaternion networks
由: TAY, Yi, et al.
出版: (2019)

An accurate solution to the cardinality-based punctuality problem
由: CAO, Zhiguang, et al.
出版: (2020)

Approximate Storage for Energy Efficient Spintronic Memories
由: Ranjan, Ashish, et al.
出版: (2019)

Win: Weight-decay-integrated nesterov acceleration for adaptive gradient algorithms
由: ZHOU, Pan, et al.
出版: (2023)

Optimization planning for 3D ConvNets
由: QIU, Zhaofan, et al.
出版: (2021)

Modeling neuromorphic persistent firing networks
由: NING, Ning, et al.
出版: (2015)