--- license: llama2 --- # Toy LLaMA-39M - This is a tiny LLaMA model pretrained on [Recag/Rp_C4_55](https://huggingface.co/datasets/Recag/Rp_C4_55), a small subset of C4 with `seq_len=512`. - Model architecture ```json { "hidden_size": 512, "intermediate_size": 2048, "max_position_embeddings": 2048, "num_attention_heads": 8, "num_hidden_layers": 2, "num_key_value_heads": 8 } ``` - Load model and tokenizer: ```python from transformers import AutoTokenizer, AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Cheng98/llama-39m") tokenizer = AutoTokenizer.from_pretrained("Cheng98/llama-39m") ``` - Training script: [huggingface/transformers/examples/pytorch/language-modeling/run_clm.py](https://github.com/huggingface/transformers/blob/e9476832942a19cf99354776ef112babc83c139a/examples/pytorch/language-modeling/run_clm.py) ```python # "train" split is created from the last 95% samples of original "train" subset raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[5%:]") ``` - Evaluation (`seq_len=512`): | Dataset | Eval loss | Perplexity | Accuracy | block_size | |----------------|-----------|------------|----------|------------| | Recag/Rp_C4_55 | 3.63 | 37.78 | 0.3561 | 512 | | Wikitext2 | 4.58 | 97.48 | 0.2719 | 512 | - Evaluation command (Wikitext2): ```bash # Evaluation command python run_clm.py --model_name_or_path Cheng98/llama-39m \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --block_size 512 \ --do_eval \ --output_dir ./results ``` - Evaluation on Recag/Rp_C4_55 (`seq_len=512`): ```python # "validation" split is created from the first 5% samples of original "train" subset raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[:5%]") ``` Results ```json { "eval_accuracy": 0.3561766818954313, "eval_loss": 3.6318140029907227, "eval_runtime": 190.8411, "eval_samples": 19413, "eval_samples_per_second": 101.723, "eval_steps_per_second": 1.593, "perplexity": 37.7812898658763 } ``` - Evaluation on Wikitext2 (`seq_len=512`): ```json { "eval_accuracy": 0.2718795201225219, "eval_loss": 4.579628944396973, "eval_runtime": 3.939, "eval_samples": 575, "eval_samples_per_second": 145.976, "eval_steps_per_second": 0.762, "perplexity": 97.47821765687856 } ```