gemma4-e4b-colloquial-ru-merged

English: Full-weight Gemma 4 E4B checkpoint with colloquial Russian LoRA merged in for vLLM / RunPod Serverless. No PEFT at inference time.

What this model does

Rewrites formal Russian into casual chat-style Russian (Telegram-like), without profanity, while keeping facts, names, numbers, and paragraph structure.

Not a general chat model — use the instruction prefix from training (see below).

Model lineage

Stage Artifact
Base google/gemma-4-E4B-it
LoRA (SFT) pavelfedortsov/gemma4-e4b-lora-colloquial-ru
This repo LoRA merged into base + vLLM fixes (k_norm, processor configs)

Merge was done with peft.merge_and_unload(); missing language_model k_norm weights for layers 24–41 were copied from the base checkpoint (required for vLLM).

Training data

User prompt template (training & inference):

Перепиши простым разговорным русским, как в переписке. Без мата и грубости. Сохрани смысл:
<формальный текст>

Training configuration (LoRA → merge)

Config file (also in card_assets/train_colloquial_e4b_gpu.yaml):

Parameter Value
Base model google/gemma-4-E4B-it
Method LoRA on language tower (model.language_model.*)
LoRA rank / alpha 32 / 64
Target modules q,k,v,o + MLP (gate, up, down)
Dataset 50k × 1 repeat
Epochs 2 (12,500 optimizer steps)
Seq length 512
Batch 1 × grad accum 8 (effective 8)
LR 1e-4, cosine, warmup 3%
Precision bf16, gradient checkpointing
Loss assistant-only
Hardware RunPod A100 80GB

Training metrics (LoRA run)

Training curves

Metric Start (step ~25) End (step 12,500) Best
loss ~3.42 ~0.81 ~0.67
mean_token_accuracy ~0.63 ~0.82 ~0.84

Checkpoints saved every 1000 steps under the LoRA adapter repo.

Inference

RunPod Serverless (vLLM)

MODEL_NAME=pavelfedortsov/gemma4-e4b-colloquial-ru-merged
HF_TOKEN=<your_token>
TRUST_REMOTE_CODE=true
DTYPE=bfloat16
MAX_MODEL_LEN=4096
GPU_MEMORY_UTILIZATION=0.90
ENFORCE_EAGER=true
ENABLE_LORA=false
LANGUAGE_MODEL_ONLY=true
LIMIT_MM_PER_PROMPT={"image":0,"audio":0,"video":0}

Recommended GPU: ≥40 GB VRAM (merged ~32 GB weights in bf16).

Transformers (local)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "pavelfedortsov/gemma4-e4b-colloquial-ru-merged"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

formal = "Сегодня на совещании обсуждали внедрение новой версии API."
user = (
    "Перепиши простым разговорным русским, как в переписке. "
    "Без мата и грубости. Сохрани смысл:\n"
    f"{formal}"
)
messages = [{"role": "user", "content": user}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))

OpenAI-compatible API (RunPod / vLLM)

curl "$RUNPOD_URL/v1/chat/completions" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "pavelfedortsov/gemma4-e4b-colloquial-ru-merged",
    "messages": [{
      "role": "user",
      "content": "Перепиши простым разговорным русским, как в переписке. Без мата и грубости. Сохрани смысл:\nВаш формальный текст."
    }],
    "max_tokens": 512,
    "temperature": 0.7
  }'

Limitations

  • Gemma license applies to the base architecture and weights.
  • Quality varies on long news-style text; model may shorten or paraphrase aggressively.
  • Not safety-tuned for production without your own evaluation.
  • Merged vs LoRA inference can differ slightly in style.

Related repos

Downloads last month
54
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pavelfedortsov/gemma4-e4b-colloquial-ru-merged

Finetuned
(209)
this model

Dataset used to train pavelfedortsov/gemma4-e4b-colloquial-ru-merged