YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LinguaAgent โ€” Gemma 4 E4B Fine-tuned for ISO Immersion

License Base Model Fine-tuned with

Part of the LinguaAgent project โ€” submitted to The Gemma 4 Good Hackathon.

What is this?

This is a LoRA fine-tune of Gemma 4 E4B, trained to act as a native-speaker NPC in ISO-immersion language learning scenarios.

The model is specifically trained to:

  • Never break character โ€” it responds as a real person (barista, border officer, doctor), not as an AI
  • Never correct grammar directly โ€” reacts naturally to broken English ("Sorry, could you repeat that?")
  • Apply realistic pressure โ€” escalates situations as the learner gains confidence
  • Stay in role under stress โ€” handles hesitation, silence, and errors the way a real native speaker would

The Problem

800 million people are learning English worldwide. The fastest method โ€” immersive conversation with a native speaker โ€” costs $30-80/hour and is inaccessible to most learners. Intelligence agencies (DLI, FSI) have used ISO-immersion for decades to train fluent speakers in weeks. This model brings that method to anyone with a smartphone.

Training Data

Custom dataset of 50 dialogues across 5 real-world scenes:

  • โ˜• London cafรฉ (ordering, WiFi, wrong order)
  • โœˆ๏ธ Airport check-in (registration, overweight bag, delays)
  • ๐Ÿ‘ฎ Passport control (purpose of visit, secondary screening)
  • ๐Ÿ’ผ Lost baggage (description, urgent cases)
  • ๐Ÿฅ NHS GP appointment (symptoms, prescriptions, referrals)

Each dialogue features a Spanish-speaking learner with authentic A2-level errors (missing articles, wrong tense, direct translation from Spanish) paired with natural in-role NPC responses.

Training Details

Parameter Value
Base model unsloth/gemma-4-E4B-it
Method LoRA (PEFT) via Unsloth
LoRA rank r=16, alpha=16
Training steps 120
Batch size 1 (gradient accumulation 4)
Learning rate 2e-4 (cosine scheduler)
Hardware Kaggle GPU T4 x2
Training time ~15 minutes

Usage

from unsloth import FastModel

model, tokenizer = FastModel.from_pretrained(
    model_name = "seikatsu666/lingua-agent-gemma4-e4b",
    max_seq_length = 2048,
    load_in_4bit = True,
)

messages = [{
    "role": "user",
    "content": "[CONTEXT: You are Emma, a barista at a busy London cafรฉ. NEVER break character. NEVER correct grammar. 1-2 sentences.]\n\nHello. I want order coffee please."
}]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=80,
    temperature=0.7,
    top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# โ†’ "Hi there! What size โ€” small, medium, or large?"

Example Outputs

Input (broken English): "I want... one coffee, please. Big one. With oat milk?" NPC (Emma, barista): "Large oat latte โ€” eat in or take away?"

Input: "I need check in. My bag maybe heavy, is problem?" NPC (David, check-in agent): "Let's weigh it first โ€” pop it on the belt."

Input: "I have pain here โ€” in head. Is five day." NPC (Dr. Patel, GP): "Five days is a while. Is it constant or does it come and go?"

Notice: the model never says "You should say..." or "The correct phrase is..." โ€” it stays in character and responds naturally.

Part of LinguaAgent

This model powers LinguaAgent โ€” a voice-first ISO immersion app where learners are dropped into real situations with an 8-second response timer and no grammar explanations. Built for The Gemma 4 Good Hackathon (Future of Education track).

  • ๐ŸŽฏ 5 mission chains with connected micro-scenes
  • ๐ŸŽค Voice-first (Web Speech API)
  • โฑ๏ธ 8-second pressure timer
  • ๐Ÿ” Post-scene feedback with error analysis
  • ๐Ÿ“ฑ Works offline via Ollama

License

CC-BY 4.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support