TongkunGuan commited on
Commit
c2717f7
Β·
verified Β·
1 Parent(s): aeb3911

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -5,6 +5,9 @@ base_model: TokenOCR
5
  base_model_relation: finetune
6
  ---
7
 
 
 
 
8
  [\[πŸ“‚ GitHub\]](https://github.com/Token-family/TokenOCR) [\[πŸ“– Paper\]]() [\[πŸ†• Blog\]]() [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start)
9
 
10
  <div align="center">
@@ -22,7 +25,8 @@ we seamlessly replace previous VFMs with TokenOCR to construct a document-level
22
 
23
  # Token Family
24
 
25
- ## TokenIT
 
26
 
27
  In the following picture, we provide an overview of the self-constructed token-level **TokenIT** dataset, comprising 20 million images and 1.8 billion
28
  text-mask pairs.
@@ -50,7 +54,9 @@ The comparisons with other visual foundation models:
50
  | **TokenOCR** | **token-level** | **TokenIT** | **20M** | **1.8B** |
51
 
52
 
53
- ## TokenOCR
 
 
54
 
55
  ### Model Architecture
56
 
@@ -136,7 +142,8 @@ Please refer to our technical report for more details.
136
 
137
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/IbLZ0CxCxDkTaHAMe7M0Q.png)
138
  -->
139
- ## TokenVL
 
140
 
141
  we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
142
  Following the previous training paradigm, TokenVL also includes two stages:
 
5
  base_model_relation: finetune
6
  ---
7
 
8
+ # A Token-level Text Image Foundation Model for Document Understanding
9
+
10
+
11
  [\[πŸ“‚ GitHub\]](https://github.com/Token-family/TokenOCR) [\[πŸ“– Paper\]]() [\[πŸ†• Blog\]]() [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start)
12
 
13
  <div align="center">
 
25
 
26
  # Token Family
27
 
28
+ <!-- ## TokenIT -->
29
+ <h2 style="color: #4CAF50;">TokenIT</h2>
30
 
31
  In the following picture, we provide an overview of the self-constructed token-level **TokenIT** dataset, comprising 20 million images and 1.8 billion
32
  text-mask pairs.
 
54
  | **TokenOCR** | **token-level** | **TokenIT** | **20M** | **1.8B** |
55
 
56
 
57
+ <!-- ## TokenOCR
58
+ -->
59
+ <h2 style="color: #4CAF50;">TokenOCR</h2>
60
 
61
  ### Model Architecture
62
 
 
142
 
143
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/IbLZ0CxCxDkTaHAMe7M0Q.png)
144
  -->
145
+ <!-- ## TokenVL -->
146
+ <h2 style="color: #4CAF50;">TokenVL</h2>
147
 
148
  we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
149
  Following the previous training paradigm, TokenVL also includes two stages: