Update README.md
Browse files
README.md
CHANGED
|
@@ -36,6 +36,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
|
|
| 36 |
**NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieve competitive results against top close sources alternatives.
|
| 37 |
|
| 38 |
### Arena ranking (using trueskill-2 ranking system):
|
|
|
|
| 39 |
| Rank | Model | μ | σ | μ − 3σ |
|
| 40 |
| ---- | --------------------------------------- | ----- | ---- | ------ |
|
| 41 |
| 🥇 1 | **gemini-flash-reasoning** | 26.75 | 0.80 | 24.35 |
|
|
@@ -45,6 +46,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
|
|
| 45 |
| 5 | **gpt-4o** | 24.48 | 0.80 | 22.08 |
|
| 46 |
| 6 | **gemini-flash-w/o\_reasoning** | 24.11 | 0.79 | 21.74 |
|
| 47 |
| 7 | **RolmoOCR** | 23.53 | 0.82 | 21.07 |
|
|
|
|
| 48 |
|
| 49 |
*we plan to realease a markdown arena, similar to llmArena, for complex document to markdown task to help evaluate different document to markdown solution*
|
| 50 |
|
|
|
|
| 36 |
**NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieve competitive results against top close sources alternatives.
|
| 37 |
|
| 38 |
### Arena ranking (using trueskill-2 ranking system):
|
| 39 |
+
<p align="center">
|
| 40 |
| Rank | Model | μ | σ | μ − 3σ |
|
| 41 |
| ---- | --------------------------------------- | ----- | ---- | ------ |
|
| 42 |
| 🥇 1 | **gemini-flash-reasoning** | 26.75 | 0.80 | 24.35 |
|
|
|
|
| 46 |
| 5 | **gpt-4o** | 24.48 | 0.80 | 22.08 |
|
| 47 |
| 6 | **gemini-flash-w/o\_reasoning** | 24.11 | 0.79 | 21.74 |
|
| 48 |
| 7 | **RolmoOCR** | 23.53 | 0.82 | 21.07 |
|
| 49 |
+
</p>
|
| 50 |
|
| 51 |
*we plan to realease a markdown arena, similar to llmArena, for complex document to markdown task to help evaluate different document to markdown solution*
|
| 52 |
|