Small Language Model Trained with Textbook Quality Data - How Far Can It Go?
Ken Tsui
kenhktsui
AI & ML interests
LLM data curation/ filtering for pretraining data, small language model, alignment particularly in hallucination, information retrieval
Organizations
Collections
2
Papers
1
models
5
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification
•
Updated
•
5
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1
Text Classification
•
Updated
•
2
kenhktsui/llm-data-textbook-quality-classifier-v1
Text Classification
•
Updated
•
36
•
7
kenhktsui/nano-phi-115M-v0.1
Text Generation
•
Updated
•
3.11k
•
4
kenhktsui/nano-phi-115M-control-v0.1
Text Generation
•
Updated
•
409
•
1
datasets
23
kenhktsui/cosmopedia_quality_score_v1v2
Viewer
•
Updated
kenhktsui/falcon_subset_quality_score_v2
Viewer
•
Updated
kenhktsui/cosmopedia_quality_score_v2
Viewer
•
Updated
•
29
kenhktsui/open-license-corpus_quality_score_v1
Viewer
•
Updated
kenhktsui/falcon_subset_quality_score_v1
Viewer
•
Updated
•
2
kenhktsui/fineweb-100k_en-med_quality_score_v1
Viewer
•
Updated
•
32
kenhktsui/TM-DATA_quality_score_v1
Viewer
•
Updated
kenhktsui/openwebtext_quality_score_v1
Viewer
•
Updated
kenhktsui/simple_wikipedia_LM_quality_score_v1
Viewer
•
Updated
•
47
kenhktsui/refinedweb-3m_quality_score_v1
Viewer
•
Updated