Model Card for aktheroy/FT_Translate_en_el_hi

This model is a fine-tuned version of facebook/m2m100_418M, designed for multilingual translation tasks between English (en), Greek (el), and Hindi (hi). The model achieves efficient translation by leveraging the M2M100 architecture, which supports many-to-many language translation.

Model Details

Model Description

Developed by: Aktheroy
Model type: Transformer-based encoder-decoder
Language(s) (NLP): English, Hindi, Greek
License: MIT
Finetuned from model: facebook/m2m100_418M

Model Sources

Repository: aktheroy/4bit_translate_en_el_hi

Uses

Direct Use

The model can be used for translation tasks between the supported languages (English, Hindi, Greek). Use cases include:

Cross-lingual communication
Multilingual content generation
Language learning assistance

Downstream Use

The model can be fine-tuned further for domain-specific translation tasks, such as medical or legal translations.

Out-of-Scope Use

The model is not suitable for:

Translating unsupported languages
Generating content for sensitive or harmful purposes

Bias, Risks, and Limitations

While the model supports multilingual translations, it might exhibit:

Biases from the pretraining and fine-tuning datasets.
Reduced performance for idiomatic expressions or cultural nuances.

Recommendations

Users should:

Verify translations, especially for critical applications.
Use supplementary tools to validate outputs in sensitive scenarios.

How to Get Started with the Model

Here is an example of how to use the model for translation tasks:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "aktheroy/4bit_translate_en_el_hi
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example input
input_text = "Hello, how are you?"
tokenizer.src_lang = "en"
tokenizer.tgt_lang = "hi"

# Tokenize and generate output
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
translation = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(translation)

Training Details

Training Data

The model was fine-tuned on a custom dataset containing parallel translations between English, Hindi, and Greek.

Training Procedure

Preprocessing

The dataset was preprocessed to:

Normalize text.
Tokenize using the M2M100 tokenizer.

Training Hyperparameters

Epochs: 10
Batch size: 16
Learning rate: 5e-5
Mixed Precision: Disabled (FP32 used)

Speeds, Sizes, Times

Training runtime: 20.3 hours
Training samples per second: 17.508
Training steps per second: 0.137
Final training loss: 0.873

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a held-out test set from the same domains as the training data.

Metrics

BLEU score (to be computed during final evaluation).

Results

Training Loss: 0.873
Detailed BLEU score results will be provided in subsequent updates.

Environmental Impact

Hardware Type: MacBook with M3 Pro
Hours used: 20.3 hours
Cloud Provider: Local hardware
Carbon Emitted: Minimal (local training)

Technical Specifications

Model Architecture and Objective

The model is based on the M2M100 architecture, a transformer-based encoder-decoder model designed for multilingual translation without relying on English as an intermediary language.

Compute Infrastructure

Hardware

Device: MacBook with M3 Pro

Software

Transformers library from Hugging Face
Python 3.12

Citation

If you use this model, please cite it as:

APA: Aktheroy (2025). Fine-Tuned M2M100 Translation Model. Hugging Face. Retrieved from https://huggingface.co/aktheroy/FT_Translate_en_el_hi

Model Card Authors

Aktheroy

Model Card Contact

For questions or feedback, contact the author via Hugging Face.

aktheroy
/

4bit_translate_en_el_hi