TheBeagle-v2beta-32B-MGS

This model is an experimental version of our latest innovation: MGS. Its up to you to figure out what does it means, but its very explicit. We didn't applied our known UNA algorithm to the forward pass, but they are entirely compatible and operates in different parts of the neural network and in different ways, tho they both can be seen as a regularization technique.

MGS

MGS stands for... Many-Geeks-Searching... and thats it. Hint: 1+1 is 2, and 1+1 is not 3

We still believe on 1-Epoch should be enough, so we just did 1 Epoch only.

Dataset

Used here the first decent (corpora & size) dataset on the hub: Magpie-Align/Magpie-Pro-300K-Filtered Kudos to the Magpie team to contribute with some decent stuff that I personally think is very good to ablate.

It achieves the following results on the evaluation set:

Loss: 0.5378 (1 Epoch), outperforming the baseline model.

Quants

All versions available

... being uploaded ...

Licensing terms:

Quants versions of this model must ONLY be distributed from the author repository, submit a commit/PR and be credited for it

Training

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 25
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
9.8642	0.0012	1	0.7195
2.077	0.0507	42	0.6161
1.0325	0.1014	84	0.6093
0.8945	0.1520	126	0.5962
0.8532	0.2027	168	0.5869
0.8185	0.2534	210	0.5805
0.81	0.3041	252	0.5719
0.7901	0.3548	294	0.5663
0.7766	0.4054	336	0.5618
0.7687	0.4561	378	0.5590
0.7443	0.5068	420	0.5564
0.7494	0.5575	462	0.5525
0.7787	0.6081	504	0.5485
0.7381	0.6588	546	0.5466
0.7359	0.7095	588	0.5444
0.7447	0.7602	630	0.5435
0.7378	0.8109	672	0.5415
0.7302	0.8615	714	0.5398
0.7476	0.9122	756	0.5391
0.715	0.9629	798	0.5378

Leaderboard Evaluation:

We'll see them soon, keep tuned :)

Thanks

Qwen Team for their outstanding model
MagPie Team for contributing plenty of datasets
Cybertron Cloud Compute

waldie
/

TheBeagle-v2beta-32B-MGS-4bpw-h6-exl2