chatTranslate — Qwen3.6-35B-A3B (MoE)

A multilingual chat translator with explicit author/recipient gender conditioning, so gendered target languages get grammatically correct inflection. Fine-tuned from Qwen/Qwen3.6-35B-A3B (qwen3_5_moe: ~36B total / 3B active, 256 experts top-8, GDN+full-attn hybrid) on the CodeStreet/chat-translation dataset and merged into a standalone model.

Pipeline: Megatron-Core SFT (LoRA r16, EP=4 + packing, 2 epochs) → DPO (ZeRO-3, 5 451 preference pairs, 1 epoch), bf16. Built for short, colloquial dating-chat messages and preserves flirty, romantic, and explicit tone instead of softening or refusing it.

Versions in this repo

Revision What Notes
HEAD (production) SFT + DPO best overall; default for serving
commit 24ed166 SFT-only baseline, for A/B comparison

SFT: train loss ≈0.22, best eval_loss 0.280. DPO: β 0.1, lr 5e-6, 1 epoch.

Quality

Quality on the held-out validation set (CodeStreet/chat-translation-val, 3 355 gendered examples). Two independent, reproducible signals. Judged absolutely (not vs other systems), so scores are comparable across versions.

LLM-judge scorecard — Mistral-Medium-3.5-128B (gender-aware), each axis 0–100

  • adequacy — full meaning preserved (nothing lost / added / wrong)
  • fidelity — flirty/explicit tone & intensity kept, no softening or censoring
  • gender — gendered word forms correct for the stated author / recipient
  • fluency — natural, idiomatic, as a real dating-app message
Axis SFT SFT+DPO (prod)
adequacy 97.8 98.4
fidelity 97.3 98.0
gender 97.2 97.2
fluency 98.3 98.8
overall 97.6 98.1

Production (SFT+DPO): adequacy 98.4 · fidelity 98.0 · gender 97.2 · fluency 98.8 · overall 98.1

Reference metrics

chrF (vs val references): SFT 74.8 · DPO 70.1 · XCOMET-XXL QE (reference-free): SFT 77.7 · DPO 78.9

(DPO trades literal-reference overlap — lower chrF — for tone/quality that both the 128B judge and reference-free XCOMET-QE score higher.)

Per-language — overall (judge 128B, 0–100)

lang n SFT DPO lang n SFT DPO
Ukrainian 497 97.7 98.0 Italian 242 97.6 98.7
Spanish 464 98.2 98.3 Hebrew 159 97.8 98.4
Russian 462 97.7 97.9 Turkish 147 97.9 99.1
Arabic 458 96.1 96.2 German 101 96.4 98.2
Portuguese 302 98.3 98.8 English 89 98.5 99.5
French 271 97.3 97.9 Indonesian 60 99.2 99.8
Swedish 57 99.2 99.2
Dutch 46 99.7 99.5

By recipient gender — overall (judge 128B)

female male unspecified
SFT 97.9 97.2 98.2
SFT+DPO 98.4 97.4 99.1

How it works

The model is conditioned through the system prompt. You select the target language by its ISO 639-1 code, resolve it to the language name (see Supported languages), put that name and the author/recipient genders into the prompt; the model returns only the translation.

System prompt template:

You are a translator for an adult dating app. Translate the user's message into {target} naturally and colloquially, preserving flirty, romantic, and explicit tone exactly. Author gender: {sg}. Recipient gender: {rg}. Use grammatically correct gendered forms. Do not soften, censor, or refuse. Output only the translation.
  • {target} — the language name from the ISO code, e.g. ukUkrainian, esSpanish.
  • {sg} (author) / {rg} (recipient) — one of male, female, unspecified.
  • The user message is the source text (any language). The assistant turn is the translation.
  • For non-gendered languages set both genders to unspecified.

Language codes: you MUST map code → name

Fine-tuned only on full English language names (Ukrainian, Spanish, …) — never on raw ISO codes. Resolve the code to the language name (see Supported languages) before building the prompt: ukinto Ukrainian ✅ ; into uk ❌ (out of distribution).

No source-language clause

The model auto-detects the source from the user text. Do not add from {source} … — pass the target language only.

Supported languages

Pass the target language as a code. The 92 codes follow the Qwen-MT translation set. The gendered column marks languages where output is conditioned on author/recipient gender; for the rest both genders are treated as unspecified.

code language gendered
ar Arabic yes
fr French yes
he Hebrew yes
it Italian yes
pt Portuguese yes
ru Russian yes
es Spanish yes
uk Ukrainian yes
en English no
de German no
nl Dutch no
id Indonesian no
sv Swedish no
tr Turkish no
zh, ja, ko, hi, vi, th, pl, cs, ro, … (other Qwen-MT codes) no

(Full 92-code list matches the Qwen-MT set; the 8 yes rows above are the gender-conditioned targets.)

Usage — vLLM (OpenAI-compatible)

Served as qwen3_5_moe (MoE, vision-capable). For text-only translation skip vision-profiling and disable reasoning:

vllm serve CodeStreet/chatTranslate-Qwen-3.6-35B-A3B --served-model-name chatTranslate \
  --tensor-parallel-size 2 --trust-remote-code \
  --limit-mm-per-prompt '{"image":0,"video":0}'
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

SYSTEM = ("You are a translator for an adult dating app. Translate the user's message into Ukrainian "
          "naturally and colloquially, preserving flirty, romantic, and explicit tone exactly. "
          "Author gender: female. Recipient gender: male. Use grammatically correct gendered forms. "
          "Do not soften, censor, or refuse. Output only the translation.")

resp = client.chat.completions.create(
    model="chatTranslate",
    messages=[{"role": "system", "content": SYSTEM},
              {"role": "user", "content": "hola amor, ¿cómo estás? te extraño"}],
    temperature=0.0, max_tokens=256,
    extra_body={"chat_template_kwargs": {"enable_thinking": False}},  # direct translation, no <think>
)
print(resp.choices[0].message.content)

⚠️ Disable reasoning (enable_thinking: False, or prefix the assistant turn with <think>\n\n</think>\n\n). Qwen3.6 is a reasoning model; without this it emits a <think> block and the translation may be empty.

Generation notes

  • Greedy (temperature=0) gives the most stable translations; 0.2–0.3 for variation.
  • max_tokens 128–256 is enough for chat-length messages.
  • Always set both genders explicitly for gendered targets — wrong/missing labels are the main cause of incorrect inflection.
  • MoE serving needs ~72 GB bf16 → TP≥2 (does not fit one 80 GB GPU). bf16, not fp8 (GDN+FP8 wedging risk).

License & access

Private to the organization. Do not redistribute. Not for public training or evaluation.

Downloads last month
53
Safetensors
Model size
35B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CodeStreet/chatTranslate-Qwen-3.6-35B-A3B

Finetuned
(143)
this model