daekeun-ml
/

Llama-2-ko-OpenOrca-gugugo-13B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

daekeun-ml commited on Nov 17, 2023

Commit

655c748

•

1 Parent(s): 4a8383d

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -15,6 +15,12 @@ datasets:
 This model was trained for PoC purposes. This is part of an experiment to check whether model performance improves when fine-tuned with large data of about 1 million samples.
 ### Model Details
 - Base Model: [beomi/llama-2-koen-13b](https://huggingface.co/beomi/llama-2-koen-13b)

 This model was trained for PoC purposes. This is part of an experiment to check whether model performance improves when fine-tuned with large data of about 1 million samples.
+[Note] There are still many people/customers who have the wrong idea that 'Always the more data, the better,' so I showed it directly with experimental data.
+In fine-tuning, data quality is much more important than simply preparing a lot of data, and keyword distribution within the dataset is also important!
+For example, when searching for process and comparison keywords in the kkullm dataset, each is about 1% of the entire dataset.
 ### Model Details
 - Base Model: [beomi/llama-2-koen-13b](https://huggingface.co/beomi/llama-2-koen-13b)