Papers
arxiv:2306.09782

Full Parameter Fine-tuning for Large Language Models with Limited Resources

Published on Jun 16, 2023
· Featured in Daily Papers on Jun 19, 2023
Authors:
Kai Lv ,
,
,
,

Abstract

Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting both academia and society. While existing approaches have focused on parameter-efficient fine-tuning, which tunes or adds a small number of parameters, few have addressed the challenge of tuning the full parameters of LLMs with limited resources. In this work, we propose a new optimizer, LOw-Memory Optimization (LOMO), which fuses the gradient computation and the parameter update in one step to reduce memory usage. By integrating LOMO with existing memory saving techniques, we reduce memory usage to 10.8% compared to the standard approach (DeepSpeed solution). Consequently, our approach enables the full parameter fine-tuning of a 65B model on a single machine with 8 RTX 3090, each with 24GB memory.

Community

Very exciting! Is there any plan for LOMO to be integrated into the HF transformers Trainer so users can begin taking advantage of the memory improvements?

·
Paper author

Thanks for you insterests. And Sure! I've raised an issue to ask for the intergration here https://github.com/huggingface/transformers/issues/29649.

Seems really interesting. I was surprised to see its 9 months old. I would have hoped it caught on a bit more.

·
Paper author

Thanks. We're trying to make it more accessible to democratizing LLM :).

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2306.09782 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2306.09782 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2306.09782 in a Space README.md to link it from this page.

Collections including this paper 1