--- license: - mit language: - en library_name: open_clip tags: - zero-shot-image-classification - clip - biology - CV - images - animals - species - taxonomy - rare species - endangered species - evolutionary biology - multimodal - knowledge-guided datasets: - iNat21 --- # Model Card for BioCLIP BioCLIP is a foundation model for the tree of life, built using CLIP architecture as a vision model for general organismal biology. This model is trained on [iNat21](https://github.com/visipedia/inat_comp/tree/master/2021), different from [BioCLIP](https://huggingface.co/imageomics/bioclip) which is trained on [TreeOfLife-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M). More information can be found in [BioCLIP](https://huggingface.co/imageomics/bioclip). ## How to Get Started with the Model BioCLIP can be used with the `open_clip` library: ```py import open_clip model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:imageomics/bioclip-vit-b-16-inat-only') tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip-vit-b-16-inat-only') ``` ## Training Details ### Compute Infrastructure Training was performed on 4 NVIDIA A100-80GB GPUs distributed over 1 node on [OSC's](https://www.osc.edu/) Ascend HPC Cluster with global batch size 16,384 for 2 days. Based on Machine Learning Impact calculator presented in Lacoste et al. (2019), that's 33.16 kg of CO2 eq., or 134km driven by an average ICE car. ### Training Data This model was trained on [iNat21](https://github.com/visipedia/inat_comp/tree/master/2021), which is a compilation of images matched to [Linnaean taxonomic rank](https://www.britannica.com/science/taxonomy/The-objectives-of-biological-classification) from kingdom through species. They are also matched with common (vernacular) name of the subject of the image where available. ### Training Hyperparameters - **Training regime:** Different from [BioCLIP](https://huggingface.co/imageomics/bioclip), this model is trained with a batch size of 16K. We pick epoch 65 with lowest loss on validation set (~5% of training samples) for downstream task evaluation. ### Summary BioCLIP outperforms general-domain baselines by 10% on average. ### Model Examination We encourage readers to see Section 4.6 of [our paper](https://doi.org/10.48550/arXiv.2311.18803). In short, BioCLIP iNat21 only forms representations that more closely align to the taxonomic hierarchy compared to general-domain baselines like CLIP or OpenCLIP. ## Citation **BibTeX:** ``` @software{bioclip2023, author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su}, doi = {10.57967/hf/1511}, month = nov, title = {BioCLIP}, version = {v0.1}, year = {2023} } ``` Please also cite our paper: ``` @article{stevens2023bioclip, title = {BIOCLIP: A Vision Foundation Model for the Tree of Life}, author = {Samuel Stevens and Jiaman Wu and Matthew J Thompson and Elizabeth G Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su}, year = {2023}, eprint = {2311.18803}, archivePrefix = {arXiv}, primaryClass = {cs.CV} } ``` Please also consider citing OpenCLIP and iNat21: ``` @software{ilharco_gabriel_2021_5143773, author={Ilharco, Gabriel and Wortsman, Mitchell and Wightman, Ross and Gordon, Cade and Carlini, Nicholas and Taori, Rohan and Dave, Achal and Shankar, Vaishaal and Namkoong, Hongseok and Miller, John and Hajishirzi, Hannaneh and Farhadi, Ali and Schmidt, Ludwig}, title={OpenCLIP}, year={2021}, doi={10.5281/zenodo.5143773}, } ``` ``` @misc{inat2021, author={Van Horn, Grant and Mac Aodha, Oisin}, title={iNat Challenge 2021 - FGVC8}, publisher={Kaggle}, year={2021}, url={https://kaggle.com/competitions/inaturalist-2021} } ``` ## Acknowledgements The authors would like to thank Josef Uyeda, Jim Balhoff, Dan Rubenstein, Hank Bart, Hilmar Lapp, Sara Beery, and colleagues from the Imageomics Institute and the OSU NLP group for their valuable feedback. We also thank the BIOSCAN-1M team and the iNaturalist team for making their data available and easy to use, and Jennifer Hammack at EOL for her invaluable help in accessing EOL’s images. The [Imageomics Institute](https://imageomics.org) is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. ## Model Card Authors Elizabeth G. Campolongo, Samuel Stevens, and Jiaman Wu ## Model Card Contact [stevens.994@osu.edu](mailto:stevens.994@osu.edu)