Edit model card

Llama-2-ko-OpenOrca-gugugo-13B

This model was trained for PoC purposes. This is part of an experiment to check whether model performance improves when fine-tuned with large data of about 1 million samples.

[Note] There are still many people/customers who have the wrong idea that 'Always the more data, the better,' so I showed it directly with experimental data. In fine-tuning, data quality is much more important than simply preparing a lot of data, and keyword distribution within the dataset is also important!

For example, when searching for process and comparison keywords in the kkullm dataset, each is about 1% of the entire dataset.

Model Details

Datasets

Trained on 1 million samples from the dataset. The training infrastructure used AWS g5.12xlarge x 2ea (total of NVIDIA A10G 8 GPUs).

Hyperparameters

The hyperparameters are simply heuristic values. For reference only:

learning_rate = 3e-5
lr_scheduler = "constant_with_warmup"
batch_size = 1
gradient_accumulation_steps = 8
lora_alpha = 16
lora_r = 16
lora_dropout = 0.1
lora_target_modules = "[gate_proj, down_proj, up_proj, q_proj, k_proj, o_proj, v_proj]"
use_flash_attention_2 = True

License

  • Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License, under LLAMA 2 COMMUNITY LICENSE AGREEMENT

This model was created as a personal experiment, unrelated to the organization I work for.

Downloads last month
1,580
Safetensors
Model size
13.2B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train daekeun-ml/Llama-2-ko-OpenOrca-gugugo-13B