TypeError: Incorrect format used for image
Hi,
Thanks for sharing.
I have used bellow script but raised error incorrect image format:
from transformers import pipeline
generator = pipeline("image-to-text", model="oddadmix/Khanandeh-0.1-Persian-OCR-2B-Instruct")
image_path = "https://www.omniglot.com/images/langsamples/udhr_persian.gif"
instruction = """Analyze the text in the provided image. Extract all readable text content
and present it in a structured Markdown format that is clear, concise,
and well-organized. Ensure proper formatting (e.g., headings, lists, or
code blocks) as necessary to represent the content effectively.
No extra explanation needed"""
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": image_path,
},
{"type": "text", "text": instruction},
],
}
]
output = generator(messages, max_new_tokens=128)[0]
Please use the same Qari notebook and change the model name
https://colab.research.google.com/github/NAMAA-ORG/public-notebooks/blob/main/Qari_Free_Colab.ipynb
I didn't get a chance to work on the model card yet.
What specifications does train dataset of 'Khanandeh' have?
Ofcourse this is better than original Qwen but for the bellow text gemma3-27b was perfect.
- Althoug I am focused on handwritting, Do you think it is important to learn printed document before handwritting?
- I would be thankfull if you share summary of 'Khanandeh' model dataset and detail of creating such dataset.
Thanks in advanced
Output:
دیناست ما شامل ۱۱ فونت ایجاد شده اند.
Do you have a dataset for handwriting? I would try to find a font that is super close to handwriting as a start and generate a dataset from it.
It all depends on the use case. For example, if I want to have a model with excellent handwriting capabilities, I would have the data exclusive to handwriting. If you are looking to build a model that can balance recognition, I would mix the dataset to be able to cover all cases.
Please let me know more about your use case.
Hello everone,
This is the best persian dataset which contains samples similar to handwriting.
https://huggingface.co/datasets/hezarai/parsynth-ocr-200k
But these are one line sentences. It is better to have samples with one paragraph or even full page text to finetune VLM.
@raminh921
here is the dataset
oddadmix/Khanandeh-0.1-Persian-dataset
@raminh921
here is the dataset
oddadmix/Khanandeh-0.1-Persian-dataset
Great work. Thanks for sharing this dataset.
Do you have any evaluation result on the testset?
I didn't get a chance to work on full evaluation results.
I will try to do it in the next weeks