Xkev
/

Llama-3.2V-11B-cot

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Xkev commited on Nov 30, 2024

Commit

7fcc77b

·

verified ·

1 Parent(s): 0a95410

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -11,9 +11,9 @@ library_name: transformers
 <!-- Provide a quick summary of what the model is/does. -->
-Llama-3.2V-11B-cot is the first version of [LLaVA-o1](https://github.com/PKU-YuanGroup/LLaVA-o1), which is a visual language model capable of spontaneous, systematic reasoning.
-The model was proposed in [LLaVA-o1: Let Vision Language Models Reason Step-by-Step](https://huggingface.co/papers/2411.10440).
 ## Model Details
@@ -61,7 +61,7 @@ You can use the inference code for Llama-3.2-11B-Vision-Instruct.
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-The model is trained on the LLaVA-o1-100k dataset (to be released).
 ### Training Procedure

 <!-- Provide a quick summary of what the model is/does. -->
+Llama-3.2V-11B-cot is the first version of [LLaVA-CoT](https://github.com/PKU-YuanGroup/LLaVA-CoT), which is a visual language model capable of spontaneous, systematic reasoning.
+The model was proposed in [LLaVA-CoT: Let Vision Language Models Reason Step-by-Step](https://huggingface.co/papers/2411.10440).
 ## Model Details
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The model is trained on the [LLaVA-CoT-100k dataset](https://huggingface.co/datasets/Xkev/LLaVA-CoT-100k).
 ### Training Procedure