Saxo's picture
Update README.md
36812a5 verified
|
raw
history blame
2.12 kB
metadata
library_name: transformers
license: apache-2.0
basemodel: meta-llama/Meta-Llama-3-8B-Instruct
datasets:
  - Saxo/total_ko_train_set_1_with_wiki_with_orca
language:
  - ko
  - en
pipeline_tag: text-generation

Model Card for Model ID

AI ์™€ ๋น…๋ฐ์ดํ„ฐ ๋ถ„์„ ์ „๋ฌธ ๊ธฐ์—…์ธ Linkbricks์˜ ๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ์ธ ์ง€์œค์„ฑ ๋ฐ•์‚ฌ(Saxo)๊ฐ€ meta-llama/Meta-Llama-3-8B๋ฅผ ๋ฒ ์ด์Šค๋ชจ๋ธ๋กœ GCP์ƒ์˜ H100-80G 8๊ฐœ๋ฅผ ํ†ตํ•ด SFT-DPO ํ›ˆ๋ จ์„ ํ•œ(8000 Tokens) ํ•œ๊ธ€ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ. ํ† ํฌ๋‚˜์ด์ €๋Š” ๋ผ๋งˆ3๋ž‘ ๋™์ผํ•˜๋ฉฐ ํ•œ๊ธ€ VOCA ํ™•์žฅ์€ ํ•˜์ง€ ์•Š์€ ๋ฒ„์ „ ์ž…๋‹ˆ๋‹ค.

Dr. Yunsung Ji (Saxo), a data scientist at Linkbricks, a company specializing in AI and big data analytics, trained the meta-llama/Meta-Llama-3-8B base model on 8 H100-60Gs on GCP for 4 hours of instructional training (8000 Tokens). Accelerate, Deepspeed Zero-3 libraries were used.

www.linkbricks.com, www.linkbricks.vc

Configuration including BitsandBytes


bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=False, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch_dtype )

args = TrainingArguments( output_dir=project_name, run_name=run_name_str, overwrite_output_dir=True, num_train_epochs=20, per_device_train_batch_size=1, gradient_accumulation_steps=4, #1 gradient_checkpointing=True, optim="paged_adamw_32bit", #optim="adamw_8bit", logging_steps=10, save_steps=100, save_strategy="epoch", learning_rate=2e-4, #2e-4 weight_decay=0.01, max_grad_norm=1, #0.3 max_steps=-1, warmup_ratio=0.1, group_by_length=False, fp16 = not torch.cuda.is_bf16_supported(), bf16 = torch.cuda.is_bf16_supported(), #fp16 = True, lr_scheduler_type="cosine", #"constant", disable_tqdm=False, report_to='wandb', push_to_hub=False )