Transformers
GGUF
English
tinyllama

Which CheckPoint Model is this based on? 1T tokens to 3T tokens training data

#1
by jasonden - opened

https://github.com/jzhang38/TinyLlama

Base models:
Date HF Checkpoint Tokens Step Commonsense Avg
2023-09-01 Pythia-1.0B 300B 143k 48.30
2023-09-04 TinyLlama-1.1B-intermediate-step-50k-105b 105B 50k 46.11
2023-09-16 TinyLlama-1.1B-intermediate-step-240k-503b 503B 240K 48.28
2023-10-01 TinyLlama-1.1B-intermediate-step-480k-1T 1T 480k 50.22
2023-11-04 TinyLlama-1.1B-intermediate-step-715k-1.5T 1.5T 715k 51.28
2023-11-20 TinyLlama-1.1B-intermediate-step-955k-2T 2T 955k 51.64
2023-12-11 TinyLlama-1.1B-intermediate-step-1195k-2.5T 2.5T 1195k 53.86
2023-12-28 TinyLlama-1.1B-intermediate-step-1431k-3T 3T 1431k 52.99

chat v1 is trained on 3T tokens

Sign up or log in to comment