English-to-Telugu Translation Model
Overview
This project is a deep learning-based English-to-Telugu translation model trained on a custom dataset. It uses Hugging Face Transformers for NLP and was developed in Google Colab. The model can be used for translating sentences with improved contextual accuracy.
Features
✅ Translates English text to Telugu
✅ Trained on a custom bilingual dataset
✅ Uses Transformer-based model
✅ Implemented and trained in Google Colab
✅ Can be fine-tuned for better accuracy
Tech Stack
- Programming Language: Python
- Framework: Hugging Face Transformers
- Model: mBART (Fine-tuned)
- Libraries:
- transformers (Hugging Face)
torch
(PyTorch)sentencepiece
(Tokenization)
- Platform: Google Colab
Dataset
- Used a custom English-Telugu parallel corpus
- Preprocessed using:
- Tokenization (SentencePiece / WordPiece)
- Lowercasing & Cleaning
- Removing noisy data
Model Training
Training was done in Google Colab using a GPU. Here’s a snippet of the fine-tuning process:
from transformers import MarianMTModel, MarianTokenizer, Trainer, TrainingArguments
Load pre-trained model & tokenizer
model_name = "aryaumesh/english-to-telugu" # Base model tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name)
Preprocess dataset (example)
def encode_data(texts): return tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
Training arguments
training_args = TrainingArguments( output_dir="./results", per_device_train_batch_size=8, num_train_epochs=3, save_steps=1000, save_total_limit=2, )
trainer = Trainer( model=model, args=training_args, train_dataset=custom_dataset, )
trainer.train()
Run the Model
def translate(text): inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) translated = model.generate(**inputs) return tokenizer.decode(translated[0], skip_special_tokens=True)
english_text = "Good morning, how are you?" telugu_translation = translate(english_text) print("Translated Text:", telugu_translation)
Future Improvements
🔹 Train on a larger dataset for better accuracy
🔹 Optimize inference speed for real-time use
🔹 Deploy as a cloud-based API (AWS/GCP)
- Downloads last month
- 39
Model tree for archita091234/fine-tuned-translation
Base model
aryaumesh/english-to-telugu