🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Organization Card
🐝📊💁
Collections
6
smol_llama 220M fine-tunes we did
-
BEE-spoke-data/smol_llama-220M-openhermes
Text Generation • Updated • 1.56k • 4 -
BEE-spoke-data/smol_llama-220M-open_instruct
Text Generation • Updated • 265 • 1 -
BEE-spoke-data/beecoder-220M-python
Text Generation • Updated • 13 • 2 -
BEE-spoke-data/zephyr-220m-sft-full
Text Generation • Updated • 1.07k • 1
spaces
1
models
44
BEE-spoke-data/MiniTokenizer-20480
Updated
BEE-spoke-data/BeeTokenizer
Updated
•
1
BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
Text Generation
•
Updated
•
41
•
1
BEE-spoke-data/Mistral-7B-v0.3-stepbasin-books-20k
Text Generation
•
Updated
•
9
BEE-spoke-data/Qwen2-1.5B-stepbasin-books
Text Generation
•
Updated
•
6
•
1
BEE-spoke-data/llama3-t5-tokenizer
Updated
BEE-spoke-data/mega-small-embed-synthSTS-16384-v1
Sentence Similarity
•
Updated
•
2
•
4
BEE-spoke-data/bert-plus-L8-v1.0-syntheticSTS-4k
Sentence Similarity
•
Updated
•
6
•
4
BEE-spoke-data/smol_llama-220M-GQA
Text Generation
•
Updated
•
3.97k
•
11
BEE-spoke-data/Jamba-900M-doc-writer
Text Generation
•
Updated
•
116
•
2
datasets
57
BEE-spoke-data/the-stack-smol-xs-scored-and-annotated-python
Viewer
•
Updated
•
100
BEE-spoke-data/upvoteweb-posts
Viewer
•
Updated
•
45.9M
•
19
BEE-spoke-data/napierone-pdf-raw
Viewer
•
Updated
•
18.5k
•
16
BEE-spoke-data/fineweb-1000_64k
Viewer
•
Updated
•
2k
•
158
BEE-spoke-data/govdocs1-image
Viewer
•
Updated
•
199k
•
28
BEE-spoke-data/sarcasm-scrolls
Viewer
•
Updated
•
8.76k
•
212
•
1
BEE-spoke-data/fineweb-edu-10BT-mincols
Viewer
•
Updated
•
9.67M
•
51
•
1
BEE-spoke-data/the-stack-smol-xl-readable
Viewer
•
Updated
•
424k
•
30
•
1
BEE-spoke-data/SaunaWeb-50k
Viewer
•
Updated
•
50k
•
5
BEE-spoke-data/UltraTextbooks-2.1-fw_mix
Viewer
•
Updated
•
7.27M
•
21
•
2