Tuti 🦜
This is a Gemma 2 9b, fined tuned using Unsloth's 4-bit quantization and LORA (QLORA), on Persian literature datasets I curated/created or found.
Use cases and datasets
Word IPA Detection
I have fined tuned this model with QLORA and only uploaded the LORA adapter, so it could be used like this:
# pip install unsloth
from unsloth import FastLanguageModel
from transformers import TextStreamer
model_name = "cnababaie/tuti"
max_seq_length = 4096 # Adjust as needed
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
)
FastLanguageModel.for_inference(model)
alpaca_prompt_template = """### Instruction:
{}
### Input:
{}
### Response:
{}"""
inputs = tokenizer(
[
alpaca_prompt_template.format(
"IPA این کلمه چیست؟", # instruction
"جوینده",
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
This will correctly output IPA as "/d͡ʒuːjænde/ (juyande)".
IPA Sources
- IPA-dict: Monolingual wordlists with pronunciation information in IPA
- Wiktionary: The Persian corpus don't contain IPA but the English one(which contains many words and phrases in other than English) are a lot of Persian words with their IPA
Persian Text Romanization
inputs = tokenizer(
[
alpaca_prompt_template.format(
"این متن چه تلفظی داره؟", # instruction
"خاک به خاطر بارش زیاد باران گل شد.",
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
This will output exact pronunciation as "Xāk be xāter-e bāreš-e ziyād-e bārān gel šod.".
Romanization Sources
- http://alefbaye2om.org/: Contain PDFs with Persian Romanized text
Persian Poem Translation
inputs = tokenizer(
[
alpaca_prompt_template.format(
"ترجمه", # instruction
"برخیز بتا بیا ز بهر دل ما\r\nحل کن به جمال خویشتن مشکل ما\r\nیک کوزه شراب تا به هم نوش کن\r\nزآن پیش که کوزهها کنند از گل ما",
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
This will output rhymed poetry with the original poem content:
"Arise, O idol, for our heart's sake, Solve our troubles with your beauty's make. One pot of wine, let's drink it all, Before they make pots from our clay's fall.".
Poem Translation Sources
- Created list of random poems from Ganjoor and translation text pair
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for cnababaie/tuti
Base model
google/gemma-2-9b