The model was trained on a diverse dataset comprising approximately 800 million tokens, including the Standard Korean Dictionary, KULLM training data from Korea University, dissertation abstracts from master's and doctoral theses, and Korean language samples from AI Hub.
This model was fine-tuned on the "meta-llama/Llama-2-13b-chat-hf" base model using PEFT (Parameter-Efficient Fine-Tuning) LoRA (Low-Rank Adaptation) techniques.
Hardware: Utilized two A100 (80G*2EA) GPUs for training.
Training Factors: This model was fine-tuned using PEFT LoRA with the HuggingFace SFTtrainer and applied fsdp. Key parameters included LoRA r = 8, LoRA alpha = 16, trained for 2 epochs, batch size of 1, and gradient accumulation of 32.
For fine-tuning this model, it is advised to consider the specific parameters used during training, such as LoRA r and LoRA alpha values, to ensure compatibility and optimal performance.
The training leveraged the fsdp (Fully Sharded Data Parallel) feature through the HuggingFace SFTtrainer for efficient memory usage and accelerated training.