π Introduction
DistilQwen2.5-7B is a distilled version of Qwen2.5-7B-Instruct, designed to distill the capabilities of stronger LLMs into smaller ones. To achieve this, we utilized a diverse range of datasets for the distillation process, including well-known open-source collections such as Magpie, Openhermes, and Mammoth 2, as well as proprietary synthetic datasets.
The training data primarily consists of instructions in Chinese and English. To enhance the quality and diversity of the instruction data, we implemented a difficulty scoring system and task-related resampling techniques.
For difficulty scoring, we employed the LLM-as-a-Judge paradigm, using the teacher model to evaluate responses based on accuracy, relevance, helpfulness, and level of detail. We then calculated the Model Fitting Difficulty (MFD) Score by subtracting the teacher model's score from the student model's score. A higher MFD Score indicates that the instruction is more valuable for distillation training. This approach allowed us to remove low-difficulty instructions from the training set, focusing on more challenging and informative examples.
After performing black-box data distillation on the model, we further conducted white-box distillation (teacher model logits distillation). Black-box knowledge distillation relies solely on the highest probability token output by the teacher model, while white-box knowledge distillation focuses more on the distribution of logits output by the teacher model, thereby providing richer information for the student model. By mimicking the logits distribution of the teacher model, white-box distillation can transfer knowledge more effectively, further enhancing the performance of the student model.
This careful curation and scoring process ensures that DistilQwen2.5-7B achieves high performance after the distillation process.
π Quick Start
Here provides a code snippet with apply_chat_template
to show you how to load the tokenizer and model and how to generate contents.
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"alibaba-pai/DistilQwen2.5-7B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/DistilQwen2.5-7B-Instruct")
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=2048οΌ
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
- Downloads last month
- 11