Visual Question Answering
Transformers
English
videollama2_mistral
text-generation
multimodal large language model
large video-language model
Inference Endpoints
ClownRat commited on
Commit
28913a8
1 Parent(s): bbb42dc

Update config.json

Browse files
Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -21,7 +21,7 @@
21
  "mm_vision_select_feature": "patch",
22
  "mm_vision_select_layer": -2,
23
  "mm_vision_tower": "openai/clip-vit-large-patch14-336",
24
- "model_type": "mistral",
25
  "num_attention_heads": 32,
26
  "num_frames": 16,
27
  "num_hidden_layers": 32,
 
21
  "mm_vision_select_feature": "patch",
22
  "mm_vision_select_layer": -2,
23
  "mm_vision_tower": "openai/clip-vit-large-patch14-336",
24
+ "model_type": "videollama2_mistral",
25
  "num_attention_heads": 32,
26
  "num_frames": 16,
27
  "num_hidden_layers": 32,