gabrielmbmb 's Collections

Upcycling Experiments

Models I pre-trained initialising SMoE models using dense model weights and the upcycling process used for Qwen1.5-MoE2.7BA (or something similar)