Visual Question Answering
Transformers
English
videollama2_mixtral
text-generation
multimodal large language model
large video-language model
Inference Endpoints
ClownRat commited on
Commit
bab71d8
1 Parent(s): a2c9800

Update config.json

Browse files
Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -21,7 +21,7 @@
21
  "mm_vision_select_feature": "patch",
22
  "mm_vision_select_layer": -2,
23
  "mm_vision_tower": "openai/clip-vit-large-patch14-336",
24
- "model_type": "mixtral",
25
  "num_attention_heads": 32,
26
  "num_experts_per_tok": 2,
27
  "num_frames": 8,
 
21
  "mm_vision_select_feature": "patch",
22
  "mm_vision_select_layer": -2,
23
  "mm_vision_tower": "openai/clip-vit-large-patch14-336",
24
+ "model_type": "videollama2_mixtral",
25
  "num_attention_heads": 32,
26
  "num_experts_per_tok": 2,
27
  "num_frames": 8,