Edit model card
  • This is a distillation experiment with Qwen2-1.5B as teacher and Qwen2-0.5B as student model respectively.
  • Samples were taken from the Pile dataset.
  • optimizer: SM3, scheduler: cosine with warmup, lr=2e-5

Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains distilled 0.5B Qwen2 language model.

Downloads last month
59
Safetensors
Model size
494M params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train aloobun/d-Qwen2-0.5B