Removed tildes to prevent markdown strikethrough rendering issues
Browse files
README.md
CHANGED
|
@@ -43,7 +43,7 @@ Chat2Find-CPT is a specialized version of the Qwen 3.5 4B model, enhanced via **
|
|
| 43 |
|
| 44 |
### Dataset
|
| 45 |
The model underwent true Continued Pre-Training on a massive 1.38 GB unstructured text corpus. The data was densely packed into:
|
| 46 |
-
- **Size:**
|
| 47 |
- **Epochs:** 1 Epoch (Standard pre-training practice to prevent overfitting).
|
| 48 |
- **Content:** Sri Lankan News & Media, Cultural Context, and domain-specific raw web data.
|
| 49 |
|
|
|
|
| 43 |
|
| 44 |
### Dataset
|
| 45 |
The model underwent true Continued Pre-Training on a massive 1.38 GB unstructured text corpus. The data was densely packed into:
|
| 46 |
+
- **Size:** 270,000 packed sequences of 2048 tokens each (**550 Million total tokens**).
|
| 47 |
- **Epochs:** 1 Epoch (Standard pre-training practice to prevent overfitting).
|
| 48 |
- **Content:** Sri Lankan News & Media, Cultural Context, and domain-specific raw web data.
|
| 49 |
|