luodian commited on
Commit
9b21e6e
1 Parent(s): f5d9401

Add OtterHD model description

Browse files
Files changed (1) hide show
  1. app.py +2 -0
app.py CHANGED
@@ -109,6 +109,8 @@ title = """
109
  # OTTER-HD: A High-Resolution Multi-modality Model
110
  [[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]]() [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
111
 
 
 
112
  **Tips**:
113
  - Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
114
  - The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.
 
109
  # OTTER-HD: A High-Resolution Multi-modality Model
110
  [[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]]() [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
111
 
112
+ **OtterHD** is a multimodal fine-tuned from [Fuyu-8B](https://huggingface.co/adept/fuyu-8b) to facilitate a more fine-grained interpretation of high-resolution visual input *without a explicit vision encoder module*. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with [Flash-Attention-2](https://github.com/Dao-AILab/flash-attention).
113
+
114
  **Tips**:
115
  - Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
116
  - The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.