# ๐Ÿค— PEFT๋กœ ์–ด๋Œ‘ํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ [[load-adapters-with-peft]] [[open-in-colab]] [Parameter-Efficient Fine Tuning (PEFT)](https://huggingface.co/blog/peft) ๋ฐฉ๋ฒ•์€ ์‚ฌ์ „ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋ฏธ์„ธ ์กฐ์ • ์ค‘ ๊ณ ์ •์‹œํ‚ค๊ณ , ๊ทธ ์œ„์— ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ๋Š” ๋งค์šฐ ์ ์€ ์ˆ˜์˜ ๋งค๊ฐœ๋ณ€์ˆ˜(์–ด๋Œ‘ํ„ฐ)๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์–ด๋Œ‘ํ„ฐ๋Š” ์ž‘์—…๋ณ„ ์ •๋ณด๋ฅผ ํ•™์Šตํ•˜๋„๋ก ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ์™„์ „ํžˆ ๋ฏธ์„ธ ์กฐ์ •๋œ ๋ชจ๋ธ์— ํ•„์ ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•˜๋ฉด์„œ, ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ ์ด๊ณ  ๋น„๊ต์  ์ ์€ ์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ PEFT๋กœ ํ›ˆ๋ จ๋œ ์–ด๋Œ‘ํ„ฐ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์ „์ฒด ๋ชจ๋ธ๋ณด๋‹ค ํ›จ์”ฌ ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ๊ณต์œ , ์ €์žฅ ๋ฐ ๊ฐ€์ ธ์˜ค๊ธฐ๊ฐ€ ํŽธ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
Hub์— ์ €์žฅ๋œ OPTForCausalLM ๋ชจ๋ธ์˜ ์–ด๋Œ‘ํ„ฐ ๊ฐ€์ค‘์น˜๋Š” ์ตœ๋Œ€ 700MB์— ๋‹ฌํ•˜๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜์˜ ์ „์ฒด ํฌ๊ธฐ์— ๋น„ํ•ด ์•ฝ 6MB์— ๋ถˆ๊ณผํ•ฉ๋‹ˆ๋‹ค.
๐Ÿค— PEFT ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด [๋ฌธ์„œ](https://huggingface.co/docs/peft/index)๋ฅผ ํ™•์ธํ•˜์„ธ์š”. ## ์„ค์ • [[setup]] ๐Ÿค— PEFT๋ฅผ ์„ค์น˜ํ•˜์—ฌ ์‹œ์ž‘ํ•˜์„ธ์š”: ```bash pip install peft ``` ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•ด๋ณด๊ณ  ์‹ถ๋‹ค๋ฉด, ๋‹ค์Œ ์†Œ์Šค์—์„œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค: ```bash pip install git+https://github.com/huggingface/peft.git ``` ## ์ง€์›๋˜๋Š” PEFT ๋ชจ๋ธ [[supported-peft-models]] ๐Ÿค— Transformers๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์ผ๋ถ€ PEFT ๋ฐฉ๋ฒ•์„ ์ง€์›ํ•˜๋ฉฐ, ๋กœ์ปฌ์ด๋‚˜ Hub์— ์ €์žฅ๋œ ์–ด๋Œ‘ํ„ฐ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ ธ์˜ค๊ณ  ๋ช‡ ์ค„์˜ ์ฝ”๋“œ๋งŒ์œผ๋กœ ์‰ฝ๊ฒŒ ์‹คํ–‰ํ•˜๊ฑฐ๋‚˜ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ๋ฐฉ๋ฒ•์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค: - [Low Rank Adapters](https://huggingface.co/docs/peft/conceptual_guides/lora) - [IA3](https://huggingface.co/docs/peft/conceptual_guides/ia3) - [AdaLoRA](https://arxiv.org/abs/2303.10512) ๐Ÿค— PEFT์™€ ๊ด€๋ จ๋œ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•(์˜ˆ: ํ”„๋กฌํ”„ํŠธ ํ›ˆ๋ จ ๋˜๋Š” ํ”„๋กฌํ”„ํŠธ ํŠœ๋‹) ๋˜๋Š” ์ผ๋ฐ˜์ ์ธ ๐Ÿค— PEFT ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด [๋ฌธ์„œ](https://huggingface.co/docs/peft/index)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ## PEFT ์–ด๋Œ‘ํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ [[load-a-peft-adapter]] ๐Ÿค— Transformers์—์„œ PEFT ์–ด๋Œ‘ํ„ฐ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์˜ค๊ณ  ์‚ฌ์šฉํ•˜๋ ค๋ฉด Hub ์ €์žฅ์†Œ๋‚˜ ๋กœ์ปฌ ๋””๋ ‰ํ„ฐ๋ฆฌ์— `adapter_config.json` ํŒŒ์ผ๊ณผ ์–ด๋Œ‘ํ„ฐ ๊ฐ€์ค‘์น˜๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ `AutoModelFor` ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ PEFT ์–ด๋Œ‘ํ„ฐ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์ธ๊ณผ ๊ด€๊ณ„ ์–ธ์–ด ๋ชจ๋ธ์šฉ PEFT ์–ด๋Œ‘ํ„ฐ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์˜ค๋ ค๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฅด์‹ญ์‹œ์˜ค: 1. PEFT ๋ชจ๋ธ ID๋ฅผ ์ง€์ •ํ•˜์‹ญ์‹œ์˜ค. 2. [`AutoModelForCausalLM`] ํด๋ž˜์Šค์— ์ „๋‹ฌํ•˜์‹ญ์‹œ์˜ค. ```py from transformers import AutoModelForCausalLM, AutoTokenizer peft_model_id = "ybelkada/opt-350m-lora" model = AutoModelForCausalLM.from_pretrained(peft_model_id) ``` `AutoModelFor` ํด๋ž˜์Šค๋‚˜ ๊ธฐ๋ณธ ๋ชจ๋ธ ํด๋ž˜์Šค(์˜ˆ: `OPTForCausalLM` ๋˜๋Š” `LlamaForCausalLM`) ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ PEFT ์–ด๋Œ‘ํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. `load_adapter` ๋ฉ”์†Œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ PEFT ์–ด๋Œ‘ํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ```py from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "facebook/opt-350m" peft_model_id = "ybelkada/opt-350m-lora" model = AutoModelForCausalLM.from_pretrained(model_id) model.load_adapter(peft_model_id) ``` ## 8๋น„ํŠธ ๋˜๋Š” 4๋น„ํŠธ๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ [[load-in-8bit-or-4bit]] `bitsandbytes` ํ†ตํ•ฉ์€ 8๋น„ํŠธ์™€ 4๋น„ํŠธ ์ •๋ฐ€๋„ ๋ฐ์ดํ„ฐ ์œ ํ˜•์„ ์ง€์›ํ•˜๋ฏ€๋กœ ํฐ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์˜ฌ ๋•Œ ์œ ์šฉํ•˜๋ฉด์„œ ๋ฉ”๋ชจ๋ฆฌ๋„ ์ ˆ์•ฝํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ํ•˜๋“œ์›จ์–ด์— ํšจ๊ณผ์ ์œผ๋กœ ๋ถ„๋ฐฐํ•˜๋ ค๋ฉด [`~PreTrainedModel.from_pretrained`]์— `load_in_8bit` ๋˜๋Š” `load_in_4bit` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  `device_map="auto"`๋ฅผ ์„ค์ •ํ•˜์„ธ์š”: ```py from transformers import AutoModelForCausalLM, AutoTokenizer peft_model_id = "ybelkada/opt-350m-lora" model = AutoModelForCausalLM.from_pretrained(peft_model_id, device_map="auto", load_in_8bit=True) ``` ## ์ƒˆ ์–ด๋Œ‘ํ„ฐ ์ถ”๊ฐ€ [[add-a-new-adapter]] ์ƒˆ ์–ด๋Œ‘ํ„ฐ๊ฐ€ ํ˜„์žฌ ์–ด๋Œ‘ํ„ฐ์™€ ๋™์ผํ•œ ์œ ํ˜•์ธ ๊ฒฝ์šฐ์— ํ•œํ•ด ๊ธฐ์กด ์–ด๋Œ‘ํ„ฐ๊ฐ€ ์žˆ๋Š” ๋ชจ๋ธ์— ์ƒˆ ์–ด๋Œ‘ํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ ค๋ฉด [`~peft.PeftModel.add_adapter`]๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋ชจ๋ธ์— ๊ธฐ์กด LoRA ์–ด๋Œ‘ํ„ฐ๊ฐ€ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ: ```py from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer from peft import PeftConfig model_id = "facebook/opt-350m" model = AutoModelForCausalLM.from_pretrained(model_id) lora_config = LoraConfig( target_modules=["q_proj", "k_proj"], init_lora_weights=False ) model.add_adapter(lora_config, adapter_name="adapter_1") ``` ์ƒˆ ์–ด๋Œ‘ํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ ค๋ฉด: ```py # attach new adapter with same config model.add_adapter(lora_config, adapter_name="adapter_2") ``` ์ด์ œ [`~peft.PeftModel.set_adapter`]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์–ด๋Œ‘ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•  ์–ด๋Œ‘ํ„ฐ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: ```py # use adapter_1 model.set_adapter("adapter_1") output = model.generate(**inputs) print(tokenizer.decode(output_disabled[0], skip_special_tokens=True)) # use adapter_2 model.set_adapter("adapter_2") output_enabled = model.generate(**inputs) print(tokenizer.decode(output_enabled[0], skip_special_tokens=True)) ``` ## ์–ด๋Œ‘ํ„ฐ ํ™œ์„ฑํ™” ๋ฐ ๋น„ํ™œ์„ฑํ™” [[enable-and-disable-adapters]] ๋ชจ๋ธ์— ์–ด๋Œ‘ํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•œ ํ›„ ์–ด๋Œ‘ํ„ฐ ๋ชจ๋“ˆ์„ ํ™œ์„ฑํ™” ๋˜๋Š” ๋น„ํ™œ์„ฑํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์–ด๋Œ‘ํ„ฐ ๋ชจ๋“ˆ์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด: ```py from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer from peft import PeftConfig model_id = "facebook/opt-350m" adapter_model_id = "ybelkada/opt-350m-lora" tokenizer = AutoTokenizer.from_pretrained(model_id) text = "Hello" inputs = tokenizer(text, return_tensors="pt") model = AutoModelForCausalLM.from_pretrained(model_id) peft_config = PeftConfig.from_pretrained(adapter_model_id) # to initiate with random weights peft_config.init_lora_weights = False model.add_adapter(peft_config) model.enable_adapters() output = model.generate(**inputs) ``` ์–ด๋Œ‘ํ„ฐ ๋ชจ๋“ˆ์„ ๋น„ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด: ```py model.disable_adapters() output = model.generate(**inputs) ``` ## PEFT ์–ด๋Œ‘ํ„ฐ ํ›ˆ๋ จ [[train-a-peft-adapter]] PEFT ์–ด๋Œ‘ํ„ฐ๋Š” [`Trainer`] ํด๋ž˜์Šค์—์„œ ์ง€์›๋˜๋ฏ€๋กœ ํŠน์ • ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋งž๊ฒŒ ์–ด๋Œ‘ํ„ฐ๋ฅผ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ช‡ ์ค„์˜ ์ฝ”๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด LoRA ์–ด๋Œ‘ํ„ฐ๋ฅผ ํ›ˆ๋ จํ•˜๋ ค๋ฉด: [`Trainer`]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์ด ์ต์ˆ™ํ•˜์ง€ ์•Š๋‹ค๋ฉด [์‚ฌ์ „ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๊ธฐ](training) ํŠœํ† ๋ฆฌ์–ผ์„ ํ™•์ธํ•˜์„ธ์š”. 1. ์ž‘์—… ์œ ํ˜• ๋ฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ง€์ •ํ•˜์—ฌ ์–ด๋Œ‘ํ„ฐ ๊ตฌ์„ฑ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [`~peft.LoraConfig`]๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ```py from peft import LoraConfig peft_config = LoraConfig( lora_alpha=16, lora_dropout=0.1, r=64, bias="none", task_type="CAUSAL_LM", ) ``` 2. ๋ชจ๋ธ์— ์–ด๋Œ‘ํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ```py model.add_adapter(peft_config) ``` 3. ์ด์ œ ๋ชจ๋ธ์„ [`Trainer`]์— ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ```py trainer = Trainer(model=model, ...) trainer.train() ``` ํ›ˆ๋ จํ•œ ์–ด๋Œ‘ํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ๋‹ค์‹œ ๊ฐ€์ ธ์˜ค๋ ค๋ฉด: ```py model.save_pretrained(save_dir) model = AutoModelForCausalLM.from_pretrained(save_dir) ```