Surya

#2
by johnlockejrr - opened

So Surya actually become Chandra but kept the name? Same Qwen3 finetuning. Why bother anyways?
Side thought: can't wait the guys from Alibaba to come up with a QWEN*-OCR to see what will remain of all the spawns.
I'm being mean because old school surya was really good. But now, all you can see is QWEN spawns.

  • Chandra OCR (Qwen-3-VL, 9B)
  • Chandra OCR 2 (Qwen-3-VL, fine-tuned)
  • Surya OCR 2 (Qwen-3-VL)
  • olmOCR (Qwen2.5-VL, 7B)
  • olmOCR-2 (Qwen2.5-VL, 8B)
  • Nanonets-OCR2-3B (Qwen-based)
  • DeepSeek-OCR-3B (Qwen backbone)
  • PaddleOCR-VL-0.9B (Qwen backbone)
    etc.

image

Datalab org
β€’
edited 1 day ago

When we set out to redo surya, we were optimizing for wide compatibility, usability on low-end GPUs and CPUs, compatibility with marker, accuracy, and multilingual performance.

Surya is still widely used, and this is a meaningful upgrade for all of those people. We boosted accuracy significantly (olmocr score 75% to 83.3%), made the model smaller, collapsed secondary models (like table recognition) into one, made it CPU-compatible, and improved language compatibility.

This model makes architectural modifications to the lm head/embeddings (look at the param counts). This preserves original surya tokenizer behavior, actually. And it does it for a clear reason - to improve memory util and accuracy.

But even if it had been a straight finetune, if it achieves goals/is useful, why are you against it? I can see from your Huggingface that you've finetuned models yourself.

I'm not against it, I just loved old Surya and I was not too happy seeing it transformed into Chandra, but that's my opinion, I like your work!
Yes, I finetuned many models, Surya, Chandra and Chandra 2 also.

P.S. I just got sick seeing everywhere QWEN3 OCRs 😁

Sign up or log in to comment