Persian Simplification Model (parsT5 Base)


Overview

This model is a fine-tuned ParsT5 (base) version designed explicitly for the Persian Simplification Task. The training data consists of Persian legal texts. The model is trained using supervised fine-tuning and employs the Unlimiformer Algorithm to handle large inputs effectively.

  • Architecture: Ahmad/parsT5-base
  • Language: Persian
  • Task: Text Simplification
  • Training Setup:
    • Algorithm for reducing computation: Unlimiformer
    • Epochs: 12
    • Hardware: NVIDIA GPU 4070
    • Trainable Blocks: Last Encoder-Decoder
    • Optimizer : AdamW + lr_scheduler
    • Input max Tokens: 4096
    • Output max Tokens: 512

Readability Scores

The following table summarizes the readability scores for the original texts and the predictions generated by the fine-tuned model:

Metric Original Texts Predictions
Gunning Fox 14.9676 7.5891
ARI 11.8796 6.7869
Dale-Chall 2.6473 1.2679
Flesch-Dayani 228.2377 244.0153

Evaluation Results

The fine-tuned model was evaluated using Rouge and BERTScore (mBERT) metrics. For comparison, the performance of two other Persian LLMs based on LLaMA is also presented:

Prediction Model Rouge1 Rouge2 RougeL Precision Recall F1
Fine-Tuned Model 38.08% 15.83% 19.41% 76.75% 71.06% 73.71%
ViraIntelligentDataMining/PersianLLaMA-13B 28.64% 9.81% 13.67% 68.36% 73.44% 70.80%
MehdiHosseiniMoghadam_AVA_Llama_3_V2 30.07% 10.33% 16.39% 68.47% 73.47% 70.87%

How to Use

You can load and use this model with the Hugging Face library as follows:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load the model
tokenizer = AutoTokenizer.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")
model = AutoModelForSeq2SeqLM.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")

# Example usage
input_text = "متن پیچیده فارسی"
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)
outputs = model.generate(**inputs)

# Decode the output
simplified_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(simplified_text)

Contact Information

For inquiries or feedback, please contact:

Author: Mohammadreza Joneidi Jafari

Email: m.r.joneidi.02@gmail.com

Downloads last month
37
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for Moryjj/FineTuned-parsT5-Simplification

Base model

Ahmad/parsT5-base
Finetuned
(1)
this model