Tuti 🦜

This is a Gemma 2 9b, fined tuned using Unsloth's 4-bit quantization and LORA (QLORA), on Persian literature datasets I curated/created or found.

Use cases and datasets

Word IPA Detection

I have fined tuned this model with QLORA and only uploaded the LORA adapter, so it could be used like this:

# pip install unsloth
from unsloth import FastLanguageModel
from transformers import TextStreamer

model_name = "cnababaie/tuti"
max_seq_length = 4096  # Adjust as needed
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)
FastLanguageModel.for_inference(model)
alpaca_prompt_template = """### Instruction:
{}

### Input:
{}

### Response:
{}"""
inputs = tokenizer(
[
    alpaca_prompt_template.format(
        "IPA این کلمه چیست؟", # instruction
        "جوینده",
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)

This will correctly output IPA as "/d͡ʒuːjænde/ (juyande)".

IPA Sources

  • IPA-dict: Monolingual wordlists with pronunciation information in IPA
  • Wiktionary: The Persian corpus don't contain IPA but the English one(which contains many words and phrases in other than English) are a lot of Persian words with their IPA

Persian Text Romanization

inputs = tokenizer(
[
    alpaca_prompt_template.format(
        "این متن چه تلفظی داره؟", # instruction
        "خاک به خاطر بارش زیاد باران گل شد.",
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)

This will output exact pronunciation as "Xāk be xāter-e bāreš-e ziyād-e bārān gel šod.".

Romanization Sources

Persian Poem Translation

inputs = tokenizer(
[
    alpaca_prompt_template.format(
        "ترجمه", # instruction
        "برخیز بتا بیا ز بهر دل ما\r\nحل کن به جمال خویشتن مشکل ما\r\nیک کوزه شراب تا به هم نوش کن\r\nزآن پیش که کوزه‌ها کنند از گل ما",
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)

This will output rhymed poetry with the original poem content:

"Arise, O idol, for our heart's sake, Solve our troubles with your beauty's make. One pot of wine, let's drink it all, Before they make pots from our clay's fall.".

Poem Translation Sources

  • Created list of random poems from Ganjoor and translation text pair
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for cnababaie/tuti

Base model

google/gemma-2-9b
Finetuned
(227)
this model