TongkunGuan
/

TokenFD

Model card Files Files and versions

TongkunGuan commited on Feb 21

Commit

54bf132

·

verified ·

1 Parent(s): 8d69646

Update README.md

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -15,7 +15,12 @@ base_model_relation: finetune
 ## Introduction
-We are excited to announce the release of `InternViT-300M-448px-V2_5`, a significant enhancement built on the foundation of `InternViT-300M-448px`. By employing **ViT incremental learning** with NTP loss (Stage 1.5), the vision encoder has improved its ability to extract visual features, enabling it to capture more comprehensive information. This improvement is particularly noticeable in domains that are underrepresented in large-scale web datasets such as LAION-5B, including multilingual OCR data and mathematical charts, among others.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/o9_FX5D8_NOS1gfnebp5s.png)

 ## Introduction
+We are excited to announce the release of `TokenOCR`, the first token-level visual foundation model specifically tailored for text-image-related tasks,
+designed to support a variety of traditional downstream applications. To facilitate the pretraining of TokenOCR,
+we also devise a high-quality data production pipeline that constructs the first token-level image text dataset,
+\textbf{TokenIT}, comprising 20 million images and 1.8 billion token-mask pairs.
+Furthermore, leveraging this foundation with exceptional image-as-text capability,
+we seamlessly replace previous VFMs with TokenOCR to construct a document-level MLLM, \textbf{TokenVL}, for VQA-based document understanding tasks.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/o9_FX5D8_NOS1gfnebp5s.png)