TongkunGuan commited on
Commit
54bf132
·
verified ·
1 Parent(s): 8d69646

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -15,7 +15,12 @@ base_model_relation: finetune
15
 
16
  ## Introduction
17
 
18
- We are excited to announce the release of `InternViT-300M-448px-V2_5`, a significant enhancement built on the foundation of `InternViT-300M-448px`. By employing **ViT incremental learning** with NTP loss (Stage 1.5), the vision encoder has improved its ability to extract visual features, enabling it to capture more comprehensive information. This improvement is particularly noticeable in domains that are underrepresented in large-scale web datasets such as LAION-5B, including multilingual OCR data and mathematical charts, among others.
 
 
 
 
 
19
 
20
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/o9_FX5D8_NOS1gfnebp5s.png)
21
 
 
15
 
16
  ## Introduction
17
 
18
+ We are excited to announce the release of `TokenOCR`, the first token-level visual foundation model specifically tailored for text-image-related tasks,
19
+ designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
20
+ we also devise a high-quality data production pipeline that constructs the first token-level image text dataset,
21
+ \textbf{TokenIT}, comprising 20 million images and 1.8 billion token-mask pairs.
22
+ Furthermore, leveraging this foundation with exceptional image-as-text capability,
23
+ we seamlessly replace previous VFMs with TokenOCR to construct a document-level MLLM, \textbf{TokenVL}, for VQA-based document understanding tasks.
24
 
25
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/o9_FX5D8_NOS1gfnebp5s.png)
26