Chat2Find
/

Chat2Find-CPT

@@ -43,7 +43,7 @@ Chat2Find-CPT is a specialized version of the Qwen 3.5 4B model, enhanced via **
 ### Dataset
 The model underwent true Continued Pre-Training on a massive 1.38 GB unstructured text corpus. The data was densely packed into:
-- **Size:** 270,000 packed sequences of 2048 tokens each (**550 Million total tokens**).
 - **Epochs:** 1 Epoch (Standard pre-training practice to prevent overfitting).
 - **Content:** Sri Lankan News & Media, Cultural Context, and domain-specific raw web data.

 ### Dataset
 The model underwent true Continued Pre-Training on a massive 1.38 GB unstructured text corpus. The data was densely packed into:
+- **Size:** 270,000 packed sequences of 2048 tokens each (**550 Million total Qwen tokens / approx. 255 Million words**).
 - **Epochs:** 1 Epoch (Standard pre-training practice to prevent overfitting).
 - **Content:** Sri Lankan News & Media, Cultural Context, and domain-specific raw web data.