--- license: apache-2.0 base_model: davidkim205/nox-solar-10.7b-v4 tags: - trl - dpo - generated_from_trainer model-index: - name: nhn_dpo_v3_nox-solar-10.7b-v4_DPO results: [] --- # nhn_dpo_v3_nox-solar-10.7b-v4_DPO ### Our Team * Youjin Chung * Jingyeom Kim ## Model ### Base Model * [davidkim205/nox-solar-10.7b-v4](https://huggingface.co/davidkim205/nox-solar-10.7b-v4) ### Hardware and Software * Hardware: A100 * 8 for training our model * Deepspeed library & Huggingface TRL Trainer ### Dataset * DPO_dataset * 자체 제작 dpo dataset(AI-hub dataset 활용) * OpenOrca DPO 등 영어 데이터셋 번역(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, 자체모델 활용) ### Training Method * [DPO](https://arxiv.org/abs/2305.18290) ## Benchmark **[Ko LM Eval Harness](https://github.com/Beomi/ko-lm-evaluation-harness)** ### 0 shot (macro f1) | kobest_boolq | kobest_copa | kobest_hellaswag | kobest_sentineg | | ------: | -----: | -----------: | ------: | | 0.931613 | 0.740751 | 0.468602 | 0.488465 |