Xkev nielsr HF staff commited on
Commit
0a95410
1 Parent(s): 6b0b9bb

Add link to paper, update pipeline tag (#3)

Browse files

- Add link to paper, update pipeline tag (541d3a5214c1507f840dcd961361be06d6939a75)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -4,7 +4,7 @@ language:
4
  - en
5
  base_model:
6
  - meta-llama/Llama-3.2-11B-Vision-Instruct
7
- pipeline_tag: visual-question-answering
8
  library_name: transformers
9
  ---
10
  # Model Card for Model ID
@@ -13,6 +13,8 @@ library_name: transformers
13
 
14
  Llama-3.2V-11B-cot is the first version of [LLaVA-o1](https://github.com/PKU-YuanGroup/LLaVA-o1), which is a visual language model capable of spontaneous, systematic reasoning.
15
 
 
 
16
  ## Model Details
17
 
18
  <!-- Provide a longer summary of what this model is. -->
 
4
  - en
5
  base_model:
6
  - meta-llama/Llama-3.2-11B-Vision-Instruct
7
+ pipeline_tag: image-text-to-text
8
  library_name: transformers
9
  ---
10
  # Model Card for Model ID
 
13
 
14
  Llama-3.2V-11B-cot is the first version of [LLaVA-o1](https://github.com/PKU-YuanGroup/LLaVA-o1), which is a visual language model capable of spontaneous, systematic reasoning.
15
 
16
+ The model was proposed in [LLaVA-o1: Let Vision Language Models Reason Step-by-Step](https://huggingface.co/papers/2411.10440).
17
+
18
  ## Model Details
19
 
20
  <!-- Provide a longer summary of what this model is. -->