Transformers
Inference Endpoints
File size: 371 Bytes
5c3cab8
 
f88466f
 
5c3cab8
 
f88466f
42631d6
 
f88466f
 
 
42631d6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
---
library_name: transformers
datasets:
- HuggingFaceTB/smollm-corpus
---

# Doge-tokenizer
 Tokenizer for the training model on [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus), and support reasoning fine-tuning like R1.
This tokenizer was trained on 2M samples from:
 - FineWeb-Edu 70%
 - Cosmopedia v2 20%
 - Python-Edu  5%
 - FineMath 5%