Cerule - A Tiny Mighty Vision Model

Based on Google's - Gemma-2b + SigLIP

 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β•β•β•
β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β•
β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
 β•šβ•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•  β•šβ•β• β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•               

We train and release "Cerule", a tiny yet powerful Vision Lanuage Model based on the newly released Google's Gemma-2b and Google's SigLIP.

- Pretraining stage : 650K images (A LAION Subset)
- Finetuning stage : 695K images (SVIT-mix-665K - Bunny mix modified by BAAI)

The training setup was 4xA100's 80GB and took ~6 hours to pretrain and ~13 hours to finetune. We modify and adapt the training code from Bunny.


Image Example
astronaut Describe the image
The image is a playful and surreal depiction of a man in a space suit, sitting on a chair and holding a green beer bottle. The man is wearing a white space suit, complete with a helmet and gloves. His feet are clad in black and white shoes, and he is placed on a sandy surface. The background features a large, blue planet, with a moon and a star visible in the sky.
mario Who are the characters in the image?
The image features three characters, two of them are Mario and Luigi, and the third one is Yoshi.

Describe the actions of the characters
The Mario and Luigi characters are holding their arms out, as if they are waving. Yoshi is standing on its own, with its arms folded.
extreme_ironing What's funny about this image?
The image is quite humorous as it depicts a man ironing clothes on the back of a yellow taxi cab. This is not a typical sight you'd expect to see in everyday life.

Training:

Training code Released !!! https://github.com/tensoic/Cerule

Inference:

Clone the following repo and following instructions for a CLI based inference. https://github.com/tensoic/Cerule

License

Model subject to Gemma(base model license) terms of use along with the underlying datasets(LAOIN and SVIT) subject to their respective licenses. All codes are Apache 2.0

Acknowledgments

We sincerely thank the Amazing teams at Google, LLaVA, and BAAI without which this project would not have been possible!

Downloads last month
22
Safetensors
Model size
2.91B params
Tensor type
BF16
Β·
Inference API
Inference API (serverless) does not yet support model repos that contain custom code.