manshoorai / README.md
rahiminia's picture
Add onnx example to readme
e572d14 verified
---
license: mit
language:
- fa
base_model:
- HooshvareLab/gpt2-fa
- openai-community/gpt2
tags:
- art
- poetry
- Persian
- Farsi
- شعر
---
# ManshoorAI
## Overview
This project fine-tunes GPT-2 to generate Persian neo-poetry inspired by the works of Sohrab Sepehri and Forough Farokhzad.
The model is a work in progress. I look forward to hear your thoughts.
## LSTM Model
**I also trained a simple LSTM model with same data in my Github page [Here](https://github.com/Rahiminia/manshoor-ai). you can compare the results to see the power of Transformers!**
## Model Details
- **Base Model**: GPT-2 (pretrained by OpenAI)
- **intermediate Model**: [HooshvareLab/gpt2-fa](https://huggingface.co/HooshvareLab/gpt2-fa)
- **Dataset**: Curated poems from Sohrab Sepehri and Forough Farokhzad
- **Fine-Tuning**: PEFT/LoRA
- **Language**: Persian (Farsi)
- **Output**: Generates poetry with free verse and metaphorical depth
## Installation & Usage
You can load the model using the HuggingFace `transformers` library:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from hazm import Normalizer
model_name = "rahiminia/manshoorai"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
def generate_poetry(prompt, max_length=30):
prompt = Normalizer().normalize(prompt)
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
output = generator(prompt, max_length=max_length)
print(output['generated_text'])
print(generate_poetry("شب آرام و خاموش"))
```
You can also use optimum `onnxruntime` to use ONNX model checkpoint:
```python
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("rahiminia/manshoorai", use_cache=False, use_io_binding=False)
tokenizer = AutoTokenizer.from_pretrained("rahiminia/manshoorai")
onnx = pipeline("text-generation", model=model, tokenizer=tokenizer)
prompt = 'در این شب سیاه'
pred = onnx(prompt)
print(pred[0]['generated_text'])
```
## Training Details
- **Tokenizer**: Tokenizer with Byte Pair Encoding (BPE) from [HooshvareLab/gpt2-fa](https://huggingface.co/HooshvareLab/gpt2-fa)
- **Training**: Fine-tuned using PyTorch and the `transformers` library
- **Hyperparameters**: Adjusted learning rate and weight decay
## Sample Outputs
**Prompt**: "باران که می‌بارد"
**Generated Text**:
- ManshoorAI
```
باران که می‌بارد من، به باغ راه یافته بودم
من این دشت را دیدم
که پر از درخت است
و در آن برگ هایم هیچ گونه سبز نیست
```
- Base Model (GPT2-fa)
```
باران که می‌بارد با خود بگوید که دیگر چه شده بود؟ اگر آن جوان از پشت نرده‌ها به پایین میرفت؛
```
**Prompt**: "در این شب سیاه"
**Generated Text**:
```
در این شب سیاه
چشم‌های سیاه اتاق‌ها
همه دیده‌های من هستند
از هر پلک چه می‌بینم.
و هر چهره روشن دیگر
من را در سکوت خانه فرو برده
```
## Limitations & Biases
- This is a work in progress, with many improvements yet to be made.
- The model may occasionally generate repetitive or incoherent lines.
- It does not strictly follow classical Persian poetry rules but leans towards free verse.
- Biases in the training dataset might influence stylistic preferences.
## Contributions & Feedback
If you use this model or have suggestions for improvement, feel free to open an issue or contribute via Hugging Face Spaces.
## License
This model is released under the MIT License. Please ensure ethical use and proper attribution when sharing generated works.