Can you distill Gemma 3?

#2
by Lollo001lollo - opened

Hi, recently I downloaded your gguf deepseek and Gemma 3 gguf, can you help me to distill Gemma 3 to only two languages, I need only Italian and Chinese, I think the code is like this, maybe, then covert to gguf is possible?

I don't have a powerful computer, but I think if distill Gemma 3, perhaps will work on my PC, I work in a school then help poor family to study this two languages, I want to give they an opportunity while self study at school when they are alone.

I will paste the code I have found, I don't know if it right, maybe you have best choice or code, then if you can we hope you will help us, thanks, best wishes.

from langdetect import detect

def filter_lang(text):
try:
lang = detect(text)
return lang in ['zh', 'it']
except:
return False

dataset = load_dataset("path/to/multilingual_data.jsonl")
filtered_data = dataset.filter(filter_lang, num_proc=8)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("google/gemma-3-27B")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-27B")

prompts = ["用中文和意大利语解释量子力学:", "Scrivi una poesia bilingue cinese-italiano:"]
synthetic_data = []
for prompt in prompts:
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
synthetic_text = tokenizer.decode(outputs[0])
synthetic_data.append({"text": synthetic_text})

from peft import LoraConfig, get_peft_model

config = LoraConfig(
r=32,
lora_alpha=64,
target_modules=["q_proj", "v_proj"], # 仅调整注意力层的投影矩阵
lora_dropout=0.1,
bias="none",
modules_to_save=["embed_tokens", "lm_head"] # 微调词嵌入和输出层
)
student_model = get_peft_model(base_model, config)

任务损失(学生输出 vs 真实标签)

loss_task = F.cross_entropy(student_logits, labels)

语言对齐损失(学生 vs 教师的中/意输出概率)

with torch.no_grad():
teacher_logits = teacher_model(input_ids).logits

提取中文/意大利语token的mask

zh_it_mask = (token_ids == zh_token) | (token_ids == it_token)
loss_align = F.kl_div(
F.log_softmax(student_logits[zh_it_mask] / T, dim=-1),
F.softmax(teacher_logits[zh_it_mask] / T, dim=-1),
reduction="batchmean"
)

total_loss = 0.8 * loss_task + 0.2 * loss_align

python3 llama.cpp/convert-hf-to-gguf.py student_model/ --outfile student-7b.gguf --qtype q4_k_m

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment