metadata
license: llama2
Toy LLaMA-39M
This is a tiny LLaMA model pretrained on Recag/Rp_C4_55, a small subset of C4 with
seq_len=512.- Model architecture
{ "hidden_size": 512, "intermediate_size": 2048, "max_position_embeddings": 2048, "num_attention_heads": 8, "num_hidden_layers": 2, "num_key_value_heads": 8 } - Load model and tokenizer:
from transformers import AutoTokenizer, AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Cheng98/llama-39m") tokenizer = AutoTokenizer.from_pretrained("Cheng98/llama-39m") - Training script: huggingface/transformers/examples/pytorch/language-modeling/run_clm.py
# "train" split is created from the last 95% samples of original "train" subset raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[5%:]")
- Model architecture
Evaluation (
seq_len=512):Dataset Eval loss Perplexity Accuracy block_size Recag/Rp_C4_55 3.63 37.78 0.3561 512 Wikitext2 4.58 97.48 0.2719 512 Evaluation command (Wikitext2):
# Evaluation command python run_clm.py --model_name_or_path Cheng98/llama-39m \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --block_size 512 \ --do_eval \ --output_dir ./resultsEvaluation on Recag/Rp_C4_55 (
seq_len=512):# "validation" split is created from the first 5% samples of original "train" subset raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[:5%]")Results
{ "eval_accuracy": 0.3561766818954313, "eval_loss": 3.6318140029907227, "eval_runtime": 190.8411, "eval_samples": 19413, "eval_samples_per_second": 101.723, "eval_steps_per_second": 1.593, "perplexity": 37.7812898658763 }Evaluation on Wikitext2 (
seq_len=512):{ "eval_accuracy": 0.2718795201225219, "eval_loss": 4.579628944396973, "eval_runtime": 3.939, "eval_samples": 575, "eval_samples_per_second": 145.976, "eval_steps_per_second": 0.762, "perplexity": 97.47821765687856 }