InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Paper
•
2312.14238
•
Published
•
18
Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Note CVPR 2024, Oral
Note Relased at 2024.02.21 | 40B parameters | More SFT data and stronger.
Note Released at 2024.02.11 | 40B parameters | scaling up LLM to 34B.
Note Released at 2024.01.24 | 19B parameters | support Chinese and stronger OCR
Note Released at 2024.02.11 | Vision Foundation Model | 448 resolution
Note Released at 2024.01.30 | Vision Foundation Model | 448 resolution