Image-Text-to-Text
Transformers
Safetensors
English
idefics2
pretraining
multimodal
vision
Inference Endpoints
File size: 92 Bytes
4444407
 
 
 
 
1
2
3
4
5
6
{
  "<end_of_utterance>": 32002,
  "<fake_token_around_image>": 32000,
  "<image>": 32001
}