Image-to-Text
Transformers
Safetensors
Japanese
English
sarashina2_vision
text-generation
multimodal
ocr
document-understanding
vision-language
custom_code
Instructions to use sbintuitions/sarashina2.2-ocr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sbintuitions/sarashina2.2-ocr with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="sbintuitions/sarashina2.2-ocr", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.2-ocr", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -91,7 +91,7 @@ VJRODa evaluates OCR capabilities for Japanese documents, particularly focusing
|
|
| 91 |
| Model | CER(↓) | BLEU(↑) |
|
| 92 |
| - | - | - |
|
| 93 |
| gpt-5-mini-2025-08-07 | 72.4 | 23.6 |
|
| 94 |
-
| Qwen3.5-
|
| 95 |
| KARAKURI VL 32B Instruct 2507 | 280 | 14.1 |
|
| 96 |
| LightOnOCR-2-1B | 158 | 28.9 |
|
| 97 |
| dots.ocr | 40.1 | 71.5 |
|
|
@@ -268,7 +268,7 @@ The following image visualizes the output bounding boxes in red:
|
|
| 268 |
```
|
| 269 |
@misc{sarashinaOCR2026,
|
| 270 |
title = {Sarashina2.2-OCR: End-to-end OCR Model for Japanese Document Parsing},
|
| 271 |
-
author = {Takumi Takada and Toshiyuki Tanaka and Kohei Uehara and Mikihiro Tanaka and Alexis Vallet and Aman Jain},
|
| 272 |
year = {2026},
|
| 273 |
url = {https://huggingface.co/sbintuitions/sarashina2.2-ocr}
|
| 274 |
}
|
|
|
|
| 91 |
| Model | CER(↓) | BLEU(↑) |
|
| 92 |
| - | - | - |
|
| 93 |
| gpt-5-mini-2025-08-07 | 72.4 | 23.6 |
|
| 94 |
+
| Qwen3.5-4B(non-thinking) | 86.1 | 47.8 |
|
| 95 |
| KARAKURI VL 32B Instruct 2507 | 280 | 14.1 |
|
| 96 |
| LightOnOCR-2-1B | 158 | 28.9 |
|
| 97 |
| dots.ocr | 40.1 | 71.5 |
|
|
|
|
| 268 |
```
|
| 269 |
@misc{sarashinaOCR2026,
|
| 270 |
title = {Sarashina2.2-OCR: End-to-end OCR Model for Japanese Document Parsing},
|
| 271 |
+
author = {Takumi Takada and Toshiyuki Tanaka and Kohei Uehara and Mikihiro Tanaka and Alexis Vallet and Aman Jain and Ryuichiro Hataya and Seitaro Shinagawa and Yuto Imai and Teppei Suzuki},
|
| 272 |
year = {2026},
|
| 273 |
url = {https://huggingface.co/sbintuitions/sarashina2.2-ocr}
|
| 274 |
}
|