penfever commited on
Commit
60dca27
·
verified ·
1 Parent(s): fa91078

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -21
README.md CHANGED
@@ -29,7 +29,7 @@ pipeline_tag: zero-shot-image-classification
29
  ---
30
 
31
 
32
- # Model Card for ArborCLIP
33
 
34
  <!-- Banner links -->
35
  <div style="text-align:center;">
@@ -45,7 +45,7 @@ pipeline_tag: zero-shot-image-classification
45
  </div>
46
 
47
 
48
- ARBORCLIP is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks.
49
 
50
  - **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
51
  - **License:** MIT
@@ -56,23 +56,23 @@ These models were developed for the benefit of the AI community as an open-sourc
56
 
57
  ### Model Description
58
 
59
- ArborCLIP is based on OpenAI's [CLIP](https://openai.com/research/clip) model.
60
  The models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) for the following configurations:
61
 
62
- - **ARBORCLIP-O:** Trained a ViT-B/16 backbone initialized from the [OpenCLIP's](https://github.com/mlfoundations/open_clip) checkpoint. The training was conducted for 40 epochs.
63
- - **ARBORCLIP-B:** Trained a ViT-B/16 backbone initialized from the [BioCLIP's](https://github.com/Imageomics/BioCLIP) checkpoint. The training was conducted for 8 epochs.
64
- - **ARBORCLIP-M:** Trained a ViT-L/14 backbone initialized from the [MetaCLIP's](https://github.com/facebookresearch/MetaCLIP) checkpoint. The training was conducted for 12 epochs.
65
 
66
 
67
  To access the checkpoints of the above models, go to the `Files and versions` tab and download the weights. These weights can be directly used for zero-shot classification and finetuning. The filenames correspond to the specific model weights -
68
- - **ARBORCLIP-O:** - `arborclip-vit-b-16-from-openai-epoch-40.pt`,
69
- - **ARBORCLIP-B:** - `arborclip-vit-b-16-from-bioclip-epoch-8.pt`
70
- - **ARBORCLIP-M** - `arborclip-vit-l-14-from-metaclip-epoch-12.pt`
71
 
72
  ### Model Training
73
- **See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section on the [Github](https://github.com/baskargroup/Arboretum) for examples of how to use ArborCLIP models in zero-shot image classification tasks.**
74
 
75
- We train three models using a modified version of the [BioCLIP / OpenCLIP](https://github.com/Imageomics/bioclip/tree/main/src/training) codebase. Each model is trained on Arboretum-40M, on 2 nodes, 8xH100 GPUs, on NYU's [Greene](https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene) high-performance compute cluster. We publicly release all code needed to reproduce our results on the [Github](https://github.com/baskargroup/Arboretum) page.
76
 
77
  We optimize our hyperparameters prior to training with [Ray](https://docs.ray.io/en/latest/index.html). Our standard training parameters are as follows:
78
 
@@ -107,7 +107,7 @@ For validating the zero-shot accuracy of our trained models and comparing to oth
107
 
108
  #### Pre-Run
109
 
110
- After cloning the [Github](https://github.com/baskargroup/Arboretum) repository and navigating to the `Arboretum/model_validation` directory, we recommend installing all the project requirements into a conda container; `pip install -r requirements.txt`. Also, before executing a command in VLHub, please add `Arboretum/model_validation/src` to your PYTHONPATH.
111
 
112
  ```bash
113
  export PYTHONPATH="$PYTHONPATH:$PWD/src";
@@ -115,28 +115,28 @@ export PYTHONPATH="$PYTHONPATH:$PWD/src";
115
 
116
  #### Base Command
117
 
118
- A basic Arboretum model evaluation command can be launched as follows. This example would evaluate a CLIP-ResNet50 checkpoint whose weights resided at the path designated via the `--resume` flag on the ImageNet validation set, and would report the results to Weights and Biases.
119
 
120
  ```bash
121
  python src/training/main.py --batch-size=32 --workers=8 --imagenet-val "/imagenet/val/" --model="resnet50" --zeroshot-frequency=1 --image-size=224 --resume "/PATH/TO/WEIGHTS.pth" --report-to wandb
122
  ```
123
 
124
  ### Training Dataset
125
- - **Dataset Repository:** [Arboretum](https://github.com/baskargroup/Arboretum)
126
- - **Dataset Paper:** Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity ([arXiv](https://arxiv.org/abs/2406.17720))
127
- - **HF Dataset card:** [Arboretum](https://huggingface.co/datasets/ChihHsuan-Yang/Arboretum)
128
 
129
 
130
  ### Model's Limitation
131
- All the `ArborCLIP` models were evaluated on the challenging [CONFOUNDING-SPECIES](https://arxiv.org/abs/2306.02507) benchmark. However, all the models performed at or below random chance. This could be an interesting avenue for follow-up work and further expand the models capabilities.
132
 
133
  In general, we found that models trained on web-scraped data performed better with common
134
  names, whereas models trained on specialist datasets performed better when using scientific names.
135
  Additionally, models trained on web-scraped data excel at classifying at the highest taxonomic
136
  level (kingdom), while models begin to benefit from specialist datasets like [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) and
137
- [Tree-of-Life-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M) at the lower taxonomic levels (order and species). From a practical standpoint, `ArborCLIP` is highly accurate at the species level, and higher-level taxa can be deterministically derived from lower ones.
138
 
139
- Addressing these limitations will further enhance the applicability of models like `ArborCLIP` in real-world biodiversity monitoring tasks.
140
 
141
  ### Acknowledgements
142
  This work was supported by the AI Research Institutes program supported by the NSF and USDA-NIFA under [AI Institute: for Resilient Agriculture](https://aiira.iastate.edu/), Award No. 2021-67021-35329. This was also
@@ -150,7 +150,7 @@ expertise.
150
  <h2 class="title">Citation</h2>
151
  If you find the models and datasets useful in your research, please consider citing our paper:
152
  <pre><code>@misc{yang2024arboretumlargemultimodaldataset,
153
- title={Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity},
154
  author={Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab,
155
  Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh,
156
  Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian},
@@ -166,4 +166,4 @@ expertise.
166
 
167
  ---
168
 
169
- For more details and access to the Arboretum dataset, please visit the [Project Page](https://baskargroup.github.io/Arboretum/).
 
29
  ---
30
 
31
 
32
+ # Model Card for BioTrove
33
 
34
  <!-- Banner links -->
35
  <div style="text-align:center;">
 
45
  </div>
46
 
47
 
48
+ BIOTROVE is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks.
49
 
50
  - **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
51
  - **License:** MIT
 
56
 
57
  ### Model Description
58
 
59
+ BioTrove is based on OpenAI's [CLIP](https://openai.com/research/clip) model.
60
  The models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) for the following configurations:
61
 
62
+ - **BIOTROVE-O:** Trained a ViT-B/16 backbone initialized from the [OpenCLIP's](https://github.com/mlfoundations/open_clip) checkpoint. The training was conducted for 40 epochs.
63
+ - **BIOTROVE-B:** Trained a ViT-B/16 backbone initialized from the [BioCLIP's](https://github.com/Imageomics/BioCLIP) checkpoint. The training was conducted for 8 epochs.
64
+ - **BIOTROVE-M:** Trained a ViT-L/14 backbone initialized from the [MetaCLIP's](https://github.com/facebookresearch/MetaCLIP) checkpoint. The training was conducted for 12 epochs.
65
 
66
 
67
  To access the checkpoints of the above models, go to the `Files and versions` tab and download the weights. These weights can be directly used for zero-shot classification and finetuning. The filenames correspond to the specific model weights -
68
+ - **BIOTROVE-O:** - `BIOTROVE-vit-b-16-from-openai-epoch-40.pt`,
69
+ - **BIOTROVE-B:** - `BIOTROVE-vit-b-16-from-bioclip-epoch-8.pt`
70
+ - **BIOTROVE-M** - `BIOTROVE-vit-l-14-from-metaclip-epoch-12.pt`
71
 
72
  ### Model Training
73
+ **See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section on the [Github](https://github.com/baskargroup/Arboretum) for examples of how to use BioTrove models in zero-shot image classification tasks.**
74
 
75
+ We train three models using a modified version of the [BioCLIP / OpenCLIP](https://github.com/Imageomics/bioclip/tree/main/src/training) codebase. Each model is trained on BioTrove-40M, on 2 nodes, 8xH100 GPUs, on NYU's [Greene](https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene) high-performance compute cluster. We publicly release all code needed to reproduce our results on the [Github](https://github.com/baskargroup/Arboretum) page.
76
 
77
  We optimize our hyperparameters prior to training with [Ray](https://docs.ray.io/en/latest/index.html). Our standard training parameters are as follows:
78
 
 
107
 
108
  #### Pre-Run
109
 
110
+ After cloning the [Github](https://github.com/baskargroup/Arboretum) repository and navigating to the `BioTrove/model_validation` directory, we recommend installing all the project requirements into a conda container; `pip install -r requirements.txt`. Also, before executing a command in VLHub, please add `BioTrove/model_validation/src` to your PYTHONPATH.
111
 
112
  ```bash
113
  export PYTHONPATH="$PYTHONPATH:$PWD/src";
 
115
 
116
  #### Base Command
117
 
118
+ A basic BioTrove model evaluation command can be launched as follows. This example would evaluate a CLIP-ResNet50 checkpoint whose weights resided at the path designated via the `--resume` flag on the ImageNet validation set, and would report the results to Weights and Biases.
119
 
120
  ```bash
121
  python src/training/main.py --batch-size=32 --workers=8 --imagenet-val "/imagenet/val/" --model="resnet50" --zeroshot-frequency=1 --image-size=224 --resume "/PATH/TO/WEIGHTS.pth" --report-to wandb
122
  ```
123
 
124
  ### Training Dataset
125
+ - **Dataset Repository:** [BioTrove](https://github.com/baskargroup/Arboretum)
126
+ - **Dataset Paper:** BioTrove: A Large Multimodal Dataset Enabling AI for Biodiversity ([arXiv](https://arxiv.org/abs/2406.17720))
127
+ - **HF Dataset card:** [BioTrove](https://huggingface.co/datasets/ChihHsuan-Yang/Arboretum)
128
 
129
 
130
  ### Model's Limitation
131
+ All the `BioTrove` models were evaluated on the challenging [CONFOUNDING-SPECIES](https://arxiv.org/abs/2306.02507) benchmark. However, all the models performed at or below random chance. This could be an interesting avenue for follow-up work and further expand the models capabilities.
132
 
133
  In general, we found that models trained on web-scraped data performed better with common
134
  names, whereas models trained on specialist datasets performed better when using scientific names.
135
  Additionally, models trained on web-scraped data excel at classifying at the highest taxonomic
136
  level (kingdom), while models begin to benefit from specialist datasets like [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) and
137
+ [Tree-of-Life-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M) at the lower taxonomic levels (order and species). From a practical standpoint, `BioTrove` is highly accurate at the species level, and higher-level taxa can be deterministically derived from lower ones.
138
 
139
+ Addressing these limitations will further enhance the applicability of models like `BioTrove` in real-world biodiversity monitoring tasks.
140
 
141
  ### Acknowledgements
142
  This work was supported by the AI Research Institutes program supported by the NSF and USDA-NIFA under [AI Institute: for Resilient Agriculture](https://aiira.iastate.edu/), Award No. 2021-67021-35329. This was also
 
150
  <h2 class="title">Citation</h2>
151
  If you find the models and datasets useful in your research, please consider citing our paper:
152
  <pre><code>@misc{yang2024arboretumlargemultimodaldataset,
153
+ title={BioTrove: A Large Multimodal Dataset Enabling AI for Biodiversity},
154
  author={Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab,
155
  Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh,
156
  Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian},
 
166
 
167
  ---
168
 
169
+ For more details and access to the BioTrove dataset, please visit the [Project Page](https://baskargroup.github.io/Arboretum/).