jasonkrone/02-06-25-llama8b-3dot1-sft-v0dot4-lower-lr-1node Text Generation • Updated 24 days ago • 61
jasonkrone/02-06-25-llama8b-3dot1-sft-v0dot4-lower-lr-1node Text Generation • Updated 24 days ago • 61
jasonkrone/02-04-25-llama8b-3dot1-sft-v0dot3-og-lr-1node Text Generation • Updated 25 days ago • 397
jasonkrone/02-04-25-llama8b-3dot1-sft-v0dot3-og-lr-1node Text Generation • Updated 25 days ago • 397
jasonkrone/pythia-1-dot-4b-deduped-111b-toks-mc-finetune-hpo-lr-with-mmlu Text Generation • Updated Jan 11 • 36
jasonkrone/pythia-1-dot-4b-deduped-69b-toks-mc-finetune-hpo-lr-with-mmlu Text Generation • Updated Jan 11 • 33
jasonkrone/pythia-1-dot-4b-deduped-27b-toks-try2-mc-finetune-hpo-lr-with-mmlu Text Generation • Updated Jan 11 • 33
jasonkrone/pythia-1-dot-4b-deduped-111b-toks-mc-finetune-with-mmlu Text Generation • Updated Jan 3 • 45
jasonkrone/mmlu_with_mmlu_pro_train_and_concat_dev_val_for_dev_hpo Viewer • Updated Nov 15, 2024 • 21.1k • 59
📀 Dataset comparison models Collection 1.8B models trained on 350BT to compare different pretraining datasets • 8 items • Updated Jun 12, 2024 • 37