yiko-12b-mu / README.md
Taekyoon's picture
Update README.md
b19e138 verified
metadata
license: cc-by-nc-sa-4.0

Experiment Objectives

  1. Is Training with Korean + Multi-lingual dataset helpful to perform Korean benchmarks?
  2. Does Full Parameter Depth-Up Scaled Training (expansion method: Llama-Pro) help to perform the best Korean benchmark performance?

Methods

  1. Training CJK + En + Glot dataset with the same ratio of data size.
  2. Layer Expansion and full parameter training.