Persian Simplification Model (parsT5 Base)
Overview
This model is a fine-tuned ParsT5 (base) version designed explicitly for the Persian Simplification Task. The training data consists of Persian legal texts. The model is trained using supervised fine-tuning and employs the Unlimiformer Algorithm to handle large inputs effectively.
- Architecture: Ahmad/parsT5-base
- Language: Persian
- Task: Text Simplification
- Training Setup:
- Algorithm for reducing computation: Unlimiformer
- Epochs: 12
- Hardware: NVIDIA GPU 4070
- Trainable Blocks: Last Encoder-Decoder
- Optimizer : AdamW + lr_scheduler
- Input max Tokens: 4096
- Output max Tokens: 512
Readability Scores
The following table summarizes the readability scores for the original texts and the predictions generated by the fine-tuned model:
Metric | Original Texts | Predictions |
---|---|---|
Gunning Fox | 14.9676 | 7.5891 |
ARI | 11.8796 | 6.7869 |
Dale-Chall | 2.6473 | 1.2679 |
Flesch-Dayani | 228.2377 | 244.0153 |
Evaluation Results
The fine-tuned model was evaluated using Rouge and BERTScore (mBERT) metrics. For comparison, the performance of two other Persian LLMs based on LLaMA is also presented:
Prediction Model | Rouge1 | Rouge2 | RougeL | Precision | Recall | F1 |
---|---|---|---|---|---|---|
Fine-Tuned Model | 38.08% | 15.83% | 19.41% | 76.75% | 71.06% | 73.71% |
ViraIntelligentDataMining/PersianLLaMA-13B | 28.64% | 9.81% | 13.67% | 68.36% | 73.44% | 70.80% |
MehdiHosseiniMoghadam_AVA_Llama_3_V2 | 30.07% | 10.33% | 16.39% | 68.47% | 73.47% | 70.87% |
How to Use
You can load and use this model with the Hugging Face library as follows:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# Load the model
tokenizer = AutoTokenizer.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")
model = AutoModelForSeq2SeqLM.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")
# Example usage
input_text = "متن پیچیده فارسی"
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)
outputs = model.generate(**inputs)
# Decode the output
simplified_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(simplified_text)
Contact Information
For inquiries or feedback, please contact:
Author: Mohammadreza Joneidi Jafari
Email: m.r.joneidi.02@gmail.com
- Downloads last month
- 37
Model tree for Moryjj/FineTuned-parsT5-Simplification
Base model
Ahmad/parsT5-base