pszemraj
/

mega-ar-350m-v0.13

Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

Edit model card

mega-ar-350m-v0.13

Model description

Continued-training of BEE-spoke-data/mega-ar-350m-L3t-v0.08-ultraTBfw on a few more datasets.

It achieves the following results on the evaluation set (BEE-spoke-data/UltraTextbooks-2.1-fw_mix):

Loss: 1.9926
Accuracy: 0.5885
Num Input Tokens Seen: 3468165120

Quick eval

Quick eval for: pszemraj/mega-ar-350m-v0.13

hf (pretrained=pszemraj/mega-ar-350m-v0.13,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
arc_easy	1	none	0	acc	0.4491	±	0.0102
		none	0	acc_norm	0.4061	±	0.0101
boolq	2	none	0	acc	0.5367	±	0.0087
lambada_openai	1	none	0	perplexity	55.3308	±	2.3100
		none	0	acc	0.3113	±	0.0065
openbookqa	1	none	0	acc	0.1760	±	0.0170
		none	0	acc_norm	0.2680	±	0.0198
piqa	1	none	0	acc	0.6366	±	0.0112
		none	0	acc_norm	0.6213	±	0.0113
winogrande	1	none	0	acc	0.5036	±	0.0141

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 80085
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 32
total_train_batch_size: 96
total_eval_batch_size: 3
optimizer: Adam with betas=(0.9,0.985) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1.0

Downloads last month: 1,698

Safetensors

Model size

350M params

Tensor type

F32

·

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train pszemraj/mega-ar-350m-v0.13