Chat2Find commited on
Commit
fc0882b
·
verified ·
1 Parent(s): ebab79e

Added clarification between Qwen token count and word count

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -43,7 +43,7 @@ Chat2Find-CPT is a specialized version of the Qwen 3.5 4B model, enhanced via **
43
 
44
  ### Dataset
45
  The model underwent true Continued Pre-Training on a massive 1.38 GB unstructured text corpus. The data was densely packed into:
46
- - **Size:** 270,000 packed sequences of 2048 tokens each (**550 Million total tokens**).
47
  - **Epochs:** 1 Epoch (Standard pre-training practice to prevent overfitting).
48
  - **Content:** Sri Lankan News & Media, Cultural Context, and domain-specific raw web data.
49
 
 
43
 
44
  ### Dataset
45
  The model underwent true Continued Pre-Training on a massive 1.38 GB unstructured text corpus. The data was densely packed into:
46
+ - **Size:** 270,000 packed sequences of 2048 tokens each (**550 Million total Qwen tokens / approx. 255 Million words**).
47
  - **Epochs:** 1 Epoch (Standard pre-training practice to prevent overfitting).
48
  - **Content:** Sri Lankan News & Media, Cultural Context, and domain-specific raw web data.
49