Edit model card

4x1.8B MoE Qwen Ckpt 18000

This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.

This model is a checkpoint model for the continue pretraining stage.

Evaluations

Groups Metric Value Stderr
boolq acc 0.6502 ± 0.0083
ceval-valid acc 0.5171 ± 0.1872
acc_norm 0.5171 ± 0.1872
cmmlu acc 0.5041 ± 0.1222
acc_norm 0.5041 ± 0.1222
mathqa acc 0.2693 ± 0.0081
acc_norm 0.2693 ± 0.0081

Acknowledgements

License Agreement

This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: LICENSE.

During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.

Downloads last month
4
Safetensors
Model size
4.27B params
Tensor type
F32
·
BF16
·