Small Language Model Trained with Textbook Quality Data - How Far Can It Go?
Ken Tsui
kenhktsui
AI & ML interests
LLM data curation/ filtering for pretraining data, small language model, alignment particularly in hallucination, information retrieval
Organizations
Collections
2
Papers
1
models
4
kenhktsui/llm-data-textbook-quality-fasttext-classifer-v1
Text Classification
•
Updated
•
2
kenhktsui/llm-data-textbook-quality-classifer-v1
Text Classification
•
Updated
•
144
•
7
kenhktsui/nano-phi-115M-v0.1
Text Generation
•
Updated
•
2.53k
•
3
kenhktsui/nano-phi-115M-control-v0.1
Text Generation
•
Updated
•
99
•
1
datasets
20
kenhktsui/open-license-corpus_quality_score_v1
Viewer
•
Updated
kenhktsui/falcon_subset_quality_score_v1
Viewer
•
Updated
•
3
kenhktsui/fineweb-100k_en-med_quality_score_v1
Viewer
•
Updated
•
2
kenhktsui/TM-DATA_quality_score_v1
Viewer
•
Updated
•
17
kenhktsui/openwebtext_quality_score_v1
Viewer
•
Updated
•
19
kenhktsui/simple_wikipedia_LM_quality_score_v1
Viewer
•
Updated
•
40
kenhktsui/refinedweb-3m_quality_score_v1
Viewer
•
Updated
•
18
kenhktsui/minipile_quality_score_v1
Viewer
•
Updated
•
40
kenhktsui/open-react-retrieval-multi-neg-result-new-kw
Viewer
•
Updated
•
2
kenhktsui/open-toolformer-retrieval-multi-neg-result-new-kw
Viewer
•
Updated