Image-Text-to-Text
Transformers
Safetensors
English
idefics2
pretraining
multimodal
vision
Inference Endpoints
5 papers

Pre-training code

#47
by bilibraker - opened

Thank you for the awesome model!
Is there a plan to release a pre-training script similar to LLaVA?

HuggingFaceM4 org

Thanks!
We will not open the codebase as it is now complex and consistently changing, but it follows closely the implementation in Transformers, and we use DeepSpeed Zero 3 for the model parallelization.

HugoLaurencon changed discussion status to closed

Sign up or log in to comment