- [Conceptual 12M Dataset](https://github.com/google-research-datasets/conceptual-12m)

- [VQA v2 Dataset](https://visualqa.org/challenge.html)

- [Hybrid CLIP Example](https://github.com/huggingface/transformers/blob/master/src/transformers/models/clip/modeling_flax_clip.py)

- [VisualBERT Modeling File](https://github.com/huggingface/transformers/blob/master/src/transformers/models/visual_bert/modeling_visual_bert.py)

- [BERT Modeling File](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_flax_bert.py)

- [CLIP Modeling File](https://github.com/huggingface/transformers/blob/master/src/transformers/models/clip/modeling_flax_clip.py)

- [Summarization Training Script](https://github.com/huggingface/transformers/blob/master/examples/flax/summarization/run_summarization_flax.py)

- [MLM Training Script](https://github.com/huggingface/transformers/blob/2df63282e010ac518683252d8ddba21e58d2faf3/examples/flax/language-modeling/run_mlm_flax.py)