YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Grammar Correction with Text-to-Text Transfer Transformer

πŸ“Œ Overview

This repository hosts the quantized version of the T5 model fine-tuned for Grammar Correction. The model has been trained on the JFLEG dataset from Hugging Face to enhance grammatical accuracy in given text inputs. The model is quantized to Float16 (FP16) to optimize inference speed and efficiency while maintaining high performance.

πŸ— Model Details

  • Model Architecture: t5-base
  • Task: Grammar Correction
  • Dataset: Hugging Face's jfleg
  • Quantization: Float16 (FP16) for optimized inference
  • Fine-tuning Framework: Hugging Face Transformers

πŸš€ Usage

Installation

pip install transformers torch

Loading the Model

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model_name = "AventIQ-AI/t5-grammar-correction"
model = T5ForConditionalGeneration.from_pretrained(model_name).to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)

Grammar Correction Inference

def correct_grammar(text, model, tokenizer, device):
    prefix = "correct grammar: "
    input_text = prefix + text
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
    outputs = model.generate(
        input_ids,
        max_length=128,
        num_beams=5,
        early_stopping=True,
    )
    corrected_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return corrected_text

# πŸ” Test Example
test_sentences = [
    "He go to the store yesterday.",
    "They was running in the park.",
    "She dont like pizza.",
    "We has completed the project already.",
]
for sentence in test_sentences:
    corrected = correct_grammar(sentence, model, tokenizer, device)
    print(f"Original: {sentence}")
    print(f"Corrected: {corrected}")
    print("---")

πŸ“Š Evaluation Metric: BLEU Score

For grammar correction, a high BLEU score indicates that the model’s corrected sentences closely match human-annotated corrections.

Interpreting Our BLEU Score

Our model achieved a BLEU score of 0.8888, which indicates:
βœ… Good grammar correction ability
βœ… Moderate sentence fluency

BLEU is computed by comparing the 1-gram, 2-gram, 3-gram, and 4-gram overlaps between the model’s output and the reference sentence while applying a brevity penalty if the model generates shorter sentences.

BLEU Score Ranges for Grammar Correction

BLEU Score Interpretation
0.8 - 1.0 Near-perfect corrections, closely matching human annotations.
0.7 - 0.8 High-quality corrections, minor variations in phrasing.
0.6 - 0.7 Good corrections, but with some grammatical errors or missing words. βœ… (Our Model)
0.5 - 0.6 Decent corrections, noticeable mistakes, lacks fluency.
Below 0.5 Needs improvement, frequent incorrect corrections.

⚑ Quantization Details

Post-training quantization was applied using PyTorch's built-in quantization framework. The model was quantized to Float16 (FP16) to reduce model size and improve inference efficiency while balancing accuracy.

πŸ“‚ Repository Structure

.
β”œβ”€β”€ model/               # Contains the quantized model files
β”œβ”€β”€ tokenizer_config/    # Tokenizer configuration and vocabulary files
β”œβ”€β”€ model.safetensors/   # Quantized Model
β”œβ”€β”€ README.md            # Model documentation

⚠️ Limitations

  • The model may struggle with highly ambiguous sentences.
  • Quantization may lead to slight degradation in accuracy compared to full-precision models.
  • Performance may vary across different writing styles and sentence structures.

🀝 Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.

Downloads last month
145
Safetensors
Model size
223M params
Tensor type
FP16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Space using AventIQ-AI/t5-grammar-correction 1