Persian Simplification Model (parsT5 Base)

Overview

This model is a fine-tuned ParsT5 (base) version designed explicitly for the Persian Simplification Task. The training data consists of Persian legal texts. The model is trained using supervised fine-tuning and employs the Unlimiformer Algorithm to handle large inputs effectively.

Architecture: Ahmad/parsT5-base
Language: Persian
Task: Text Simplification
Training Setup:
- Algorithm for reducing computation: Unlimiformer
- Epochs: 12
- Hardware: NVIDIA GPU 4070
- Trainable Blocks: Last Encoder-Decoder
- Optimizer : AdamW + lr_scheduler
- Input max Tokens: 4096
- Output max Tokens: 512

Readability Scores

The following table summarizes the readability scores for the original texts and the predictions generated by the fine-tuned model:

Metric	Original Texts	Predictions
Gunning Fox	14.9676	7.5891
ARI	11.8796	6.7869
Dale-Chall	2.6473	1.2679
Flesch-Dayani	228.2377	244.0153

Evaluation Results

The fine-tuned model was evaluated using Rouge and BERTScore (mBERT) metrics. For comparison, the performance of two other Persian LLMs based on LLaMA is also presented:

Prediction Model	Rouge1	Rouge2	RougeL	Precision	Recall	F1
Fine-Tuned Model	38.08%	15.83%	19.41%	76.75%	71.06%	73.71%
ViraIntelligentDataMining/PersianLLaMA-13B	28.64%	9.81%	13.67%	68.36%	73.44%	70.80%
MehdiHosseiniMoghadam_AVA_Llama_3_V2	30.07%	10.33%	16.39%	68.47%	73.47%	70.87%

How to Use

You can load and use this model with the Hugging Face library as follows:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load the model
tokenizer = AutoTokenizer.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")
model = AutoModelForSeq2SeqLM.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")

# Example usage
input_text = "متن پیچیده فارسی"
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)
outputs = model.generate(**inputs)

# Decode the output
simplified_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(simplified_text)

Contact Information

For inquiries or feedback, please contact:

Author: Mohammadreza Joneidi Jafari

Email: m.r.joneidi.02@gmail.com