llama-39m / README.md
Cheng98's picture
Update README.md
7add242 verified
metadata
license: llama2

Toy LLaMA-39M

  • This is a tiny LLaMA model pretrained on Recag/Rp_C4_55, a small subset of C4 with seq_len=512.

    • Model architecture
      {
        "hidden_size": 512,
        "intermediate_size": 2048,
        "max_position_embeddings": 2048,
        "num_attention_heads": 8,
        "num_hidden_layers": 2,
        "num_key_value_heads": 8
      }
      
    • Load model and tokenizer:
      from transformers import AutoTokenizer, AutoModelForCausalLM
      model = AutoModelForCausalLM.from_pretrained("Cheng98/llama-39m")
      tokenizer = AutoTokenizer.from_pretrained("Cheng98/llama-39m")
      
    • Training script: huggingface/transformers/examples/pytorch/language-modeling/run_clm.py
      # "train" split is created from the last 95% samples of original "train" subset
      raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[5%:]")
      
  • Evaluation (seq_len=512):

    Dataset Eval loss Perplexity Accuracy block_size
    Recag/Rp_C4_55 3.63 37.78 0.3561 512
    Wikitext2 4.58 97.48 0.2719 512
    • Evaluation command (Wikitext2):

      # Evaluation command
      python run_clm.py --model_name_or_path Cheng98/llama-39m \
        --dataset_name wikitext \
        --dataset_config_name wikitext-2-raw-v1 \
        --block_size 512 \
        --do_eval \
        --output_dir ./results
      
    • Evaluation on Recag/Rp_C4_55 (seq_len=512):

      # "validation" split is created from the first 5% samples of original "train" subset
      raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[:5%]")
      

      Results

      {
        "eval_accuracy": 0.3561766818954313,
        "eval_loss": 3.6318140029907227,
        "eval_runtime": 190.8411,
        "eval_samples": 19413,
        "eval_samples_per_second": 101.723,
        "eval_steps_per_second": 1.593,
        "perplexity": 37.7812898658763
      }
      
    • Evaluation on Wikitext2 (seq_len=512):

      {
        "eval_accuracy": 0.2718795201225219,
        "eval_loss": 4.579628944396973,
        "eval_runtime": 3.939,
        "eval_samples": 575,
        "eval_samples_per_second": 145.976,
        "eval_steps_per_second": 0.762,
        "perplexity": 97.47821765687856
      }