YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Assamese Instruction Following Model using mT5-small
This project fine-tunes the mT5-small model for Assamese language instruction following tasks. The model is designed to understand questions in Assamese and generate relevant responses.
Model Description
- Base Model: google/mt5-small (Multilingual T5)
- Fine-tuned on: Assamese instruction-following dataset
- Task: Question answering and instruction following in Assamese
- Training Device: Google Colab T4 GPU
Dataset
- Total Examples: 28,910
- Training Set: 23,128 examples
- Validation Set: 5,782 examples
- Format: Instruction-Input-Output pairs in Assamese
Training Configuration
training_args = Seq2SeqTrainingArguments(
num_train_epochs=2,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
warmup_steps=200,
weight_decay=0.01,
gradient_accumulation_steps=2
)
Model Capabilities
The model can:
Process Assamese script input
Recognize different question types
Maintain basic Assamese grammar
Generate responses in Assamese
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/mt5-assamese-instructions")
model = AutoModelForSeq2SeqLM.from_pretrained("your-username/mt5-assamese-instructions")
# Example input
text = "জীৱনত কেনেকৈ সফল হ'ব?" # How to succeed in life?
# Generate response
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Limitations
Current limitations include:
Tendency for repetitive responses
Limited coherence in longer answers
Basic response structure
Memory constraints due to T4 GPU
Future Improvements
Planned improvements include:
Better response generation parameters
Enhanced data preprocessing
Structural markers in training data
Optimization for longer responses
Improved coherence in outputs
@misc{mt5-assamese-instructions,
author = {NanduvardhanReddy},
title = {mT5-small Fine-tuned for Assamese Instructions},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub}
}
Acknowledgments
Google's mT5 team for the base model
Hugging Face for the transformers library
Google Colab for computation resources
License
This project is licensed under the Apache License 2.0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.