metadata

license: apache-2.0
metrics:
  - accuracy
inference:
  parameters:
    max_new_tokens: 64
    do_sample: true
    temperature: 0.7
    repetition_penalty: 1.1
    no_repeat_ngram_size: 6
    eta_cutoff: 0.0008
    renormalize_logits: true
widget:
  - text: My name is El Microondas the Wise, and
    example_title: El Microondas
  - text: Kennesaw State University is a public
    example_title: Kennesaw State University
  - text: >-
      Bungie Studios is an American video game developer. They are most famous
      for developing the award winning Halo series of video games. They also
      made Destiny. The studio was founded
    example_title: Bungie
  - text: The Mona Lisa is a world-renowned painting created by
    example_title: Mona Lisa
  - text: >-
      The Harry Potter series, written by J.K. Rowling, begins with the book
      titled
    example_title: Harry Potter Series
  - text: >-
      Question: I have cities, but no houses. I have mountains, but no trees. I
      have water, but no fish. What am I?

      Answer:
    example_title: Riddle
  - text: The process of photosynthesis involves the conversion of
    example_title: Photosynthesis
  - text: >-
      Jane went to the store to buy some groceries. She picked up apples,
      oranges, and a loaf of bread. When she got home, she realized she forgot
    example_title: Story Continuation
  - text: >-
      Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
      and another train leaves Station B at 10:00 AM and travels at 80 mph, when
      will they meet if the distance between the stations is 300 miles?

      To determine
    example_title: Math Problem
  - text: In the context of computer programming, an algorithm is
    example_title: Algorithm Definition
pipeline_tag: text-generation
datasets:
  - BEE-spoke-data/UltraTextbooks-2.1-fw_mix
language:
  - en

mega-ar-350m-L3t-v0.08-ultraTBfw

Model description

This is a pretraining experiment most recently trained on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset. It achieves the following results on the evaluation set:

Loss: 2.0787
Accuracy: 0.5746
Num Input Tokens Seen: 3492282368

Quick eval

Quick eval for: pszemraj/mega-ar-350m-L3t-v0.08-ultraTBfw

hf (pretrained=pszemraj/mega-ar-350m-L3t-v0.08-ultraTBfw,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: 0.99999, num_fewshot: None, batch_size: 8

Tasks	Version	Filter	Metric	Value		Stderr
arc_easy	1	none	acc	0.4246	±	0.0139
		none	acc_norm	0.4002	±	0.0138
boolq	2	none	acc	0.5762	±	0.0139
lambada_openai	1	none	perplexity	76.7162	±	6.3531
		none	acc	0.2605	±	0.0123
openbookqa	1	none	acc	0.1840	±	0.0173
		none	acc_norm	0.2720	±	0.0199
piqa	1	none	acc	0.6377	±	0.0135
		none	acc_norm	0.6172	±	0.0137
winogrande	1	none	acc	0.5020	±	0.0141

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 4e-05
train_batch_size: 1
eval_batch_size: 1
seed: 80085
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 32
total_train_batch_size: 128
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
lr_scheduler_type: inverse_sqrt
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Input Tokens Seen
2.2572	0.0600	400	2.2462	0.5491	209715200
2.2173	0.1201	800	2.1939	0.5564	419430400
2.1992	0.1801	1200	2.1689	0.5604	629145600
2.1543	0.2402	1600	2.1521	0.5632	838860800
2.1532	0.3002	2000	2.1401	0.5650	1048576000
2.1688	0.3603	2400	2.1307	0.5663	1258291200
2.1443	0.4203	2800	2.1227	0.5676	1468006400
2.1105	0.4804	3200	2.1158	0.5689	1677721600
2.1045	0.5404	3600	2.1090	0.5700	1887436800
2.1181	0.6004	4000	2.1045	0.5708	2097152000
2.127	0.6605	4400	2.0994	0.5716	2306867200
2.1265	0.7205	4800	2.0958	0.5719	2516582400
2.0951	0.7806	5200	2.0909	0.5728	2726297600
2.0951	0.8406	5600	2.0876	0.5733	2936012800
2.1335	0.9007	6000	2.0838	0.5739	3145728000
2.0731	0.9607	6400	2.0802	0.5744	3355443200

Framework versions

Transformers 4.40.1
Pytorch 2.3.0+cu121
Datasets 2.19.0
Tokenizers 0.19.1