beomi commited on
Commit
9c9032e
1 Parent(s): 7f619a1

Fixed typo

Browse files
Files changed (1) hide show
  1. README.md +1 -14
README.md CHANGED
@@ -37,20 +37,7 @@ SOLAR-KOEN-10.8B is an auto-regressive language model that leverages an optimize
37
 
38
  | |Training Data|Parameters|Content Length|GQA|Tokens|Learning Rate|
39
  |---|---|---|---|---|---|---|
40
- |SOLAR-KOEN-10.8B|*A curated mix of Korean+English Corpora*|10.8B|4k|O|>15B*|5e<sup>-5</sup>|
41
-
42
- **Training Corpus**
43
-
44
- The model was trained using selected datasets from AIHub and Modu Corpus. Detailed information about the training datasets is available below:
45
-
46
- - AI Hub: [corpus/AI_HUB](./corpus/AI_HUB)
47
- - Only the `Training` segment of the data was used.
48
- - The `Validation` and `Test` segments were deliberately excluded.
49
- - Modu Corpus: [corpus/MODU_CORPUS](./corpus/MODU_CORPUS)
50
-
51
- The final JSONL dataset used to train this model is approximately 61GB in size.
52
-
53
- Total token count: Approximately 15 billion tokens (*using the expanded tokenizer. With the original SOLAR tokenizer, >60 billion tokens.)
54
 
55
  **Vocab Expansion**
56
 
 
37
 
38
  | |Training Data|Parameters|Content Length|GQA|Tokens|Learning Rate|
39
  |---|---|---|---|---|---|---|
40
+ |SOLAR-KOEN-10.8B|*A curated mix of Korean+English Corpora*|10.8B|4k|O|>60B*|5e<sup>-5</sup>|
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  **Vocab Expansion**
43