Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM
•
10
Congratulations! With all the US/EU big players being more secretive than ever, you're not just bringing good models, but really making an incredible contribution to open research.
And I slightly disagree on one point: Qwen-500m is SOTA. Never thought it could be possible to pour results like this from such a small multilingual model for RAG tasks in French.