daekeun-ml commited on
Commit
655c748
1 Parent(s): 4a8383d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -15,6 +15,12 @@ datasets:
15
 
16
  This model was trained for PoC purposes. This is part of an experiment to check whether model performance improves when fine-tuned with large data of about 1 million samples.
17
 
 
 
 
 
 
 
18
  ### Model Details
19
  - Base Model: [beomi/llama-2-koen-13b](https://huggingface.co/beomi/llama-2-koen-13b)
20
 
 
15
 
16
  This model was trained for PoC purposes. This is part of an experiment to check whether model performance improves when fine-tuned with large data of about 1 million samples.
17
 
18
+ [Note] There are still many people/customers who have the wrong idea that 'Always the more data, the better,' so I showed it directly with experimental data.
19
+ In fine-tuning, data quality is much more important than simply preparing a lot of data, and keyword distribution within the dataset is also important!
20
+
21
+ For example, when searching for process and comparison keywords in the kkullm dataset, each is about 1% of the entire dataset.
22
+
23
+
24
  ### Model Details
25
  - Base Model: [beomi/llama-2-koen-13b](https://huggingface.co/beomi/llama-2-koen-13b)
26