pszemraj's picture
Update README.md
117c4c1 verified
metadata
license: apache-2.0
metrics:
  - accuracy
inference:
  parameters:
    max_new_tokens: 64
    do_sample: true
    temperature: 0.7
    repetition_penalty: 1.1
    no_repeat_ngram_size: 6
    eta_cutoff: 0.0008
    renormalize_logits: true
widget:
  - text: My name is El Microondas the Wise, and
    example_title: El Microondas
  - text: Kennesaw State University is a public
    example_title: Kennesaw State University
  - text: >-
      Bungie Studios is an American video game developer. They are most famous
      for developing the award winning Halo series of video games. They also
      made Destiny. The studio was founded
    example_title: Bungie
  - text: The Mona Lisa is a world-renowned painting created by
    example_title: Mona Lisa
  - text: >-
      The Harry Potter series, written by J.K. Rowling, begins with the book
      titled
    example_title: Harry Potter Series
  - text: >-
      Question: I have cities, but no houses. I have mountains, but no trees. I
      have water, but no fish. What am I?

      Answer:
    example_title: Riddle
  - text: The process of photosynthesis involves the conversion of
    example_title: Photosynthesis
  - text: >-
      Jane went to the store to buy some groceries. She picked up apples,
      oranges, and a loaf of bread. When she got home, she realized she forgot
    example_title: Story Continuation
  - text: >-
      Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
      and another train leaves Station B at 10:00 AM and travels at 80 mph, when
      will they meet if the distance between the stations is 300 miles?

      To determine
    example_title: Math Problem
  - text: In the context of computer programming, an algorithm is
    example_title: Algorithm Definition
pipeline_tag: text-generation
datasets:
  - BEE-spoke-data/UltraTextbooks-2.1-fw_mix
language:
  - en

mega-ar-350m-L3t-v0.08-ultraTBfw

Model description

This is a pretraining experiment most recently trained on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0787
  • Accuracy: 0.5746
  • Num Input Tokens Seen: 3492282368

Quick eval

Quick eval for: pszemraj/mega-ar-350m-L3t-v0.08-ultraTBfw

hf (pretrained=pszemraj/mega-ar-350m-L3t-v0.08-ultraTBfw,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: 0.99999, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc 0.4246 ± 0.0139
none 0 acc_norm 0.4002 ± 0.0138
boolq 2 none 0 acc 0.5762 ± 0.0139
lambada_openai 1 none 0 perplexity 76.7162 ± 6.3531
none 0 acc 0.2605 ± 0.0123
openbookqa 1 none 0 acc 0.1840 ± 0.0173
none 0 acc_norm 0.2720 ± 0.0199
piqa 1 none 0 acc 0.6377 ± 0.0135
none 0 acc_norm 0.6172 ± 0.0137
winogrande 1 none 0 acc 0.5020 ± 0.0141

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 80085
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
2.2572 0.0600 400 2.2462 0.5491 209715200
2.2173 0.1201 800 2.1939 0.5564 419430400
2.1992 0.1801 1200 2.1689 0.5604 629145600
2.1543 0.2402 1600 2.1521 0.5632 838860800
2.1532 0.3002 2000 2.1401 0.5650 1048576000
2.1688 0.3603 2400 2.1307 0.5663 1258291200
2.1443 0.4203 2800 2.1227 0.5676 1468006400
2.1105 0.4804 3200 2.1158 0.5689 1677721600
2.1045 0.5404 3600 2.1090 0.5700 1887436800
2.1181 0.6004 4000 2.1045 0.5708 2097152000
2.127 0.6605 4400 2.0994 0.5716 2306867200
2.1265 0.7205 4800 2.0958 0.5719 2516582400
2.0951 0.7806 5200 2.0909 0.5728 2726297600
2.0951 0.8406 5600 2.0876 0.5733 2936012800
2.1335 0.9007 6000 2.0838 0.5739 3145728000
2.0731 0.9607 6400 2.0802 0.5744 3355443200

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1