TheBeagle-v2beta-32B-MGS
This model is an experimental version of our latest innovation: MGS
. Its up to you to figure out what does it means, but its very explicit.
We didn't applied our known UNA
algorithm to the forward pass, but they are entirely compatible and operates in different parts of the neural network and in different ways, tho they both can be seen as a regularization technique.
MGS
MGS stands for... Many-Geeks-Searching... and thats it. Hint: 1+1 is 2, and 1+1 is not 3
We still believe on 1-Epoch should be enough, so we just did 1 Epoch only.
Dataset
Used here the first decent (corpora & size) dataset on the hub: Magpie-Align/Magpie-Pro-300K-Filtered
Kudos to the Magpie team to contribute with some decent stuff that I personally think is very good to ablate.
It achieves the following results on the evaluation set:
- Loss: 0.5378 (1 Epoch), outperforming the baseline model.
Quants
... being uploaded ...
Licensing terms:
Quants versions of this model must ONLY be distributed from the author repository, submit a commit/PR and be credited for it
Training
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 25
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
9.8642 | 0.0012 | 1 | 0.7195 |
2.077 | 0.0507 | 42 | 0.6161 |
1.0325 | 0.1014 | 84 | 0.6093 |
0.8945 | 0.1520 | 126 | 0.5962 |
0.8532 | 0.2027 | 168 | 0.5869 |
0.8185 | 0.2534 | 210 | 0.5805 |
0.81 | 0.3041 | 252 | 0.5719 |
0.7901 | 0.3548 | 294 | 0.5663 |
0.7766 | 0.4054 | 336 | 0.5618 |
0.7687 | 0.4561 | 378 | 0.5590 |
0.7443 | 0.5068 | 420 | 0.5564 |
0.7494 | 0.5575 | 462 | 0.5525 |
0.7787 | 0.6081 | 504 | 0.5485 |
0.7381 | 0.6588 | 546 | 0.5466 |
0.7359 | 0.7095 | 588 | 0.5444 |
0.7447 | 0.7602 | 630 | 0.5435 |
0.7378 | 0.8109 | 672 | 0.5415 |
0.7302 | 0.8615 | 714 | 0.5398 |
0.7476 | 0.9122 | 756 | 0.5391 |
0.715 | 0.9629 | 798 | 0.5378 |
Leaderboard Evaluation:
We'll see them soon, keep tuned :)
Thanks
- Qwen Team for their outstanding model
- MagPie Team for contributing plenty of datasets
- Cybertron Cloud Compute
- Downloads last month
- 12
Model tree for waldie/TheBeagle-v2beta-32B-MGS-4bpw-h6-exl2
Base model
Qwen/Qwen2.5-32B