Small Language Model Trained with Textbook Quality Data - How Far Can It Go?
Ken Tsui
kenhktsui
AI & ML interests
LLM data curation/ filtering for pretraining data, small language model, alignment particularly in hallucination, information retrieval
Organizations
Collections
2
Papers
1
models
5
kenhktsui/llm-data-textbook-quality-fasttext-classifer-v2
Text Classification
•
Updated
•
3
kenhktsui/llm-data-textbook-quality-classifer-v1
Text Classification
•
Updated
•
127
•
7
kenhktsui/llm-data-textbook-quality-fasttext-classifer-v1
Text Classification
•
Updated
•
2
kenhktsui/nano-phi-115M-v0.1
Text Generation
•
Updated
•
2.65k
•
4
kenhktsui/nano-phi-115M-control-v0.1
Text Generation
•
Updated
•
83
•
1
datasets
20
kenhktsui/open-license-corpus_quality_score_v1
Viewer
•
Updated
kenhktsui/falcon_subset_quality_score_v1
Viewer
•
Updated
•
4
kenhktsui/fineweb-100k_en-med_quality_score_v1
Viewer
•
Updated
•
24
kenhktsui/TM-DATA_quality_score_v1
Viewer
•
Updated
•
7
kenhktsui/openwebtext_quality_score_v1
Viewer
•
Updated
•
9
kenhktsui/simple_wikipedia_LM_quality_score_v1
Viewer
•
Updated
•
33
kenhktsui/refinedweb-3m_quality_score_v1
Viewer
•
Updated
•
8
kenhktsui/minipile_quality_score_v1
Viewer
•
Updated
•
33
kenhktsui/open-react-retrieval-multi-neg-result-new-kw
Viewer
•
Updated
•
3
kenhktsui/open-toolformer-retrieval-multi-neg-result-new-kw
Viewer
•
Updated