Model Card for aktheroy/FT_Translate_en_el_hi
This model is a fine-tuned version of facebook/m2m100_418M
, designed for multilingual translation tasks between English (en), Greek (el), and Hindi (hi). The model achieves efficient translation by leveraging the M2M100 architecture, which supports many-to-many language translation.
Model Details
Model Description
- Developed by: Aktheroy
- Model type: Transformer-based encoder-decoder
- Language(s) (NLP): English, Hindi, Greek
- License: MIT
- Finetuned from model: facebook/m2m100_418M
Model Sources
- Repository: aktheroy/4bit_translate_en_el_hi
Uses
Direct Use
The model can be used for translation tasks between the supported languages (English, Hindi, Greek). Use cases include:
- Cross-lingual communication
- Multilingual content generation
- Language learning assistance
Downstream Use
The model can be fine-tuned further for domain-specific translation tasks, such as medical or legal translations.
Out-of-Scope Use
The model is not suitable for:
- Translating unsupported languages
- Generating content for sensitive or harmful purposes
Bias, Risks, and Limitations
While the model supports multilingual translations, it might exhibit:
- Biases from the pretraining and fine-tuning datasets.
- Reduced performance for idiomatic expressions or cultural nuances.
Recommendations
Users should:
- Verify translations, especially for critical applications.
- Use supplementary tools to validate outputs in sensitive scenarios.
How to Get Started with the Model
Here is an example of how to use the model for translation tasks:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "aktheroy/4bit_translate_en_el_hi
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Example input
input_text = "Hello, how are you?"
tokenizer.src_lang = "en"
tokenizer.tgt_lang = "hi"
# Tokenize and generate output
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
translation = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(translation)
Training Details
Training Data
The model was fine-tuned on a custom dataset containing parallel translations between English, Hindi, and Greek.
Training Procedure
Preprocessing
The dataset was preprocessed to:
- Normalize text.
- Tokenize using the M2M100 tokenizer.
Training Hyperparameters
- Epochs: 10
- Batch size: 16
- Learning rate: 5e-5
- Mixed Precision: Disabled (FP32 used)
Speeds, Sizes, Times
- Training runtime: 20.3 hours
- Training samples per second: 17.508
- Training steps per second: 0.137
- Final training loss: 0.873
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on a held-out test set from the same domains as the training data.
Metrics
- BLEU score (to be computed during final evaluation).
Results
- Training Loss: 0.873
- Detailed BLEU score results will be provided in subsequent updates.
Environmental Impact
- Hardware Type: MacBook with M3 Pro
- Hours used: 20.3 hours
- Cloud Provider: Local hardware
- Carbon Emitted: Minimal (local training)
Technical Specifications
Model Architecture and Objective
The model is based on the M2M100 architecture, a transformer-based encoder-decoder model designed for multilingual translation without relying on English as an intermediary language.
Compute Infrastructure
Hardware
- Device: MacBook with M3 Pro
Software
- Transformers library from Hugging Face
- Python 3.12
Citation
If you use this model, please cite it as:
APA: Aktheroy (2025). Fine-Tuned M2M100 Translation Model. Hugging Face. Retrieved from https://huggingface.co/aktheroy/FT_Translate_en_el_hi
Model Card Authors
- Aktheroy
Model Card Contact
For questions or feedback, contact the author via Hugging Face.
- Downloads last month
- 156
Model tree for aktheroy/4bit_translate_en_el_hi
Base model
facebook/m2m100_418M