TongkunGuan
/

TokenFD

Model card Files Files and versions

TongkunGuan commited on Feb 21

Commit

c2717f7

·

verified ·

1 Parent(s): aeb3911

Update README.md

Files changed (1) hide show

README.md +10 -3

README.md CHANGED Viewed

@@ -5,6 +5,9 @@ base_model: TokenOCR
 base_model_relation: finetune
 ---
 [\[📂 GitHub\]](https://github.com/Token-family/TokenOCR)    [\[📖 Paper\]]() [\[🆕 Blog\]]()    [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)    [\[🚀 Quick Start\]](#quick-start)
 <div align="center">
@@ -22,7 +25,8 @@ we seamlessly replace previous VFMs with TokenOCR to construct a document-level
 # Token Family
-## TokenIT
 In the following picture, we provide an overview of the self-constructed token-level **TokenIT** dataset, comprising 20 million images and 1.8 billion
 text-mask pairs.
@@ -50,7 +54,9 @@ The comparisons with other visual foundation models:
 | **TokenOCR**           | **token-level** | **TokenIT**  | **20M**    | **1.8B**   |
-## TokenOCR
 ### Model Architecture
@@ -136,7 +142,8 @@ Please refer to our technical report for more details.
 <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/IbLZ0CxCxDkTaHAMe7M0Q.png)
  -->
-## TokenVL
 we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
 Following the previous training paradigm, TokenVL also includes two stages:

 base_model_relation: finetune
 ---
+# A Token-level Text Image Foundation Model for Document Understanding
 [\[📂 GitHub\]](https://github.com/Token-family/TokenOCR)    [\[📖 Paper\]]() [\[🆕 Blog\]]()    [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)    [\[🚀 Quick Start\]](#quick-start)
 <div align="center">
 # Token Family
+<!-- ## TokenIT -->
+<h2 style="color: #4CAF50;">TokenIT</h2>
 In the following picture, we provide an overview of the self-constructed token-level **TokenIT** dataset, comprising 20 million images and 1.8 billion
 text-mask pairs.
 | **TokenOCR**           | **token-level** | **TokenIT**  | **20M**    | **1.8B**   |
+<!-- ## TokenOCR
+ -->
+<h2 style="color: #4CAF50;">TokenOCR</h2>
 ### Model Architecture
 <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650d4a36cbd0c7d550d3b41b/IbLZ0CxCxDkTaHAMe7M0Q.png)
  -->
+<!-- ## TokenVL -->
+<h2 style="color: #4CAF50;">TokenVL</h2>
 we employ the TokenOCR as the visual foundation model and further develop an MLLM, named TokenVL, tailored for document understanding.
 Following the previous training paradigm, TokenVL also includes two stages: