Visual Question Answering
Transformers
Safetensors
English
videollama2_qwen2
text-generation
multimodal large language model
large video-language model
Inference Endpoints
lixin4ever commited on
Commit
b6fb50e
1 Parent(s): d239839

Update config.json

Browse files
Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -22,7 +22,7 @@
22
  "mm_vision_tower": "openai/clip-vit-large-patch14-336",
23
  "model_type": "videollama2_qwen2",
24
  "num_attention_heads": 64,
25
- "num_frames": 8,
26
  "num_hidden_layers": 80,
27
  "num_key_value_heads": 8,
28
  "rms_norm_eps": 1e-06,
 
22
  "mm_vision_tower": "openai/clip-vit-large-patch14-336",
23
  "model_type": "videollama2_qwen2",
24
  "num_attention_heads": 64,
25
+ "num_frames": 16,
26
  "num_hidden_layers": 80,
27
  "num_key_value_heads": 8,
28
  "rms_norm_eps": 1e-06,