Telugu Colloquial to English Translator

This model was fine-tuned for translating colloquial Telugu expressions to English, developed for the SAWiT.AI Hackathon 2025.

Model Details

  • Base Model: facebook/nllb-200-distilled-600M
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Dataset: Custom curated dataset of colloquial Telugu expressions
  • Purpose: To accurately translate natural spoken Telugu to English, with emphasis on slang and colloquial expressions

Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load model and tokenizer
model_name = "your-username/telugu-colloquial-translator"  # Replace with your username
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Translate a colloquial Telugu phrase
phrase = "Enti mawa, ela unnaavu?"
inputs = tokenizer(phrase, return_tensors="pt")
inputs.input_ids[:, 0] = tokenizer.lang_code_to_id["tel_Telu"]  # Set source language

outputs = model.generate(
    **inputs,
    forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"],  # Set target language
    max_length=128,
    num_beams=5
)

translation = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(f"Telugu: {phrase}")
print(f"English: {translation}")

Dataset

The dataset consists of colloquial Telugu expressions gathered from:

  • Everyday conversations
  • Social media content
  • Movies and TV shows dialogue
  • Youth slang and expressions

These expressions represent how Telugu is naturally spoken by native speakers in informal contexts, rather than formal written Telugu.

Evaluation

This model was evaluated on its ability to accurately translate:

  • Slang terms and idioms
  • Colloquial expressions
  • Informal grammar patterns
  • Code-mixed Telugu (with English words)

License

This model is shared under the Apache 2.0 license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train abhignakankati/telugu-colloquial-translator