Update README.md
#2
by
mwmathis
- opened
README.md
CHANGED
@@ -1,25 +1,23 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
tags:
|
4 |
- computer_vision
|
5 |
- pose_estimation
|
|
|
|
|
6 |
---
|
7 |
|
8 |
-
|
9 |
|
|
|
10 |
|
11 |
-
-
|
12 |
-
|
13 |
-
please contact EPFL-TTO (https://tto.epfl.ch/) for a full commercial license.
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
This model was trained a dataset called "Quadrupred-40K." It was trained in Tensorflow 2 within the [DeepLabCut framework](www.deeplabcut.org).
|
21 |
-
Full training details can be found in Ye et al. 2023, but in brief, this was trained with **DLCRNet** as introduced in [Lauer et al 2022 Nature Methods](https://www.nature.com/articles/s41592-022-01443-0).
|
22 |
-
You can use this model simply with our light-weight loading package called [DLCLibrary](https://github.com/DeepLabCut/DLClibrary). Here is an example useage:
|
23 |
|
24 |
```python
|
25 |
from pathlib import Path
|
@@ -31,60 +29,128 @@ model_dir.mkdir()
|
|
31 |
download_huggingface_model("superanimal_quadruped", model_dir)
|
32 |
```
|
33 |
|
34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
It consists of being trained together on the following datasets:
|
37 |
|
38 |
-
- **AwA-Pose** Quadruped dataset, see full details at (
|
39 |
-
- **AnimalPose** See full details at (
|
40 |
-
- **AcinoSet** See full details at (
|
41 |
-
- **Horse-30** Horse-30 dataset, benchmark task is called Horse-10; See full details at (
|
42 |
-
- **StanfordDogs** See full details at (
|
43 |
-
- **AP-10K** See full details at (
|
44 |
- **iRodent** We utilized the iNaturalist API functions for scraping observations
|
45 |
-
with the taxon ID of Suborder Myomorpha (
|
46 |
ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are
|
47 |
Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid
|
48 |
Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse
|
49 |
(Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then
|
50 |
generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that
|
51 |
-
uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (
|
52 |
-
pretrained on the COCO datasets (
|
53 |
-
segmentation masks.
|
54 |
-
|
55 |
-
Here is an image with the keypoint guide, the distribution of images per dataset, and examples from the datasets inferenced with a model trained with less data for benchmarking as in Ye et al 2023.
|
56 |
-
Thereby note that performance of this model we are releasing has comporable or higher performance.
|
57 |
-
|
58 |
-
Please note that each dataest was labeled by separate labs & seperate individuals, therefore while we map names
|
59 |
-
to a unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2023 for our Supplementary Note on annotator bias).
|
60 |
-
You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle.
|
61 |
-
We recommend if performance is not as good as you need it to be, first try video adaptation (see Ye et al. 2023),
|
62 |
-
or fine-tune these weights with your own labeling.
|
63 |
|
|
|
64 |
<p align="center">
|
65 |
<img src="https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" width="95%">
|
66 |
</p>
|
67 |
|
68 |
|
69 |
-
|
70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019.
|
72 |
-
|
73 |
A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation
|
74 |
(ICRA), pages 13901–13908, 2021.
|
75 |
-
|
76 |
boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,
|
77 |
pages 1859–1868, 2021.
|
78 |
-
|
79 |
on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.
|
80 |
-
|
81 |
animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018.
|
82 |
-
|
83 |
Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
|
84 |
-
|
85 |
-
|
86 |
vision, pages 2961–2969, 2017.
|
87 |
-
|
88 |
-
|
89 |
-
and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014
|
90 |
-
|
|
|
1 |
---
|
|
|
2 |
tags:
|
3 |
- computer_vision
|
4 |
- pose_estimation
|
5 |
+
- animal_pose_estimation
|
6 |
+
- deeplabcut
|
7 |
---
|
8 |
|
9 |
+
# MODEL CARD:
|
10 |
|
11 |
+
## Model Details
|
12 |
|
13 |
+
• SuperAnimal-Quadruped model developed by the [M.W.Mathis Lab](http://www.mackenziemathislab.org/) in 2023, trained to predict quadruped pose from images.
|
14 |
+
Please see [Shaokai Ye et al. 2023](https://arxiv.org/abs/2203.07436) for details.
|
|
|
15 |
|
16 |
+
• The model is an HRNet-w32 trained on our Quadruped-80K dataset.
|
17 |
|
18 |
+
• It was trained within the DeepLabCut framework. Full training details can be found in Ye et al. 2023.
|
19 |
+
You can use this model simply with our light-weight loading package called [DLCLibrary](https://github.com/DeepLabCut/DLClibrary).
|
20 |
+
Here is an example useage:
|
|
|
|
|
|
|
21 |
|
22 |
```python
|
23 |
from pathlib import Path
|
|
|
29 |
download_huggingface_model("superanimal_quadruped", model_dir)
|
30 |
```
|
31 |
|
32 |
+
## Intended Use
|
33 |
+
• Intended to be used for pose estimation of quadruped images taken from side-view. The model serves a better starting
|
34 |
+
point than ImageNet weights in downstream datasets such as AP-10K.
|
35 |
+
|
36 |
+
• Intended for academic and research professionals working in fields related to animal behavior, such as neuroscience
|
37 |
+
and ecology.
|
38 |
+
|
39 |
+
• Not suitable as a zeros-shot model for applications that require high keypiont precision, but can be fine-tuned with
|
40 |
+
minimal data to reach human-level accuracy. Also not suitable for videos that look dramatically different from those
|
41 |
+
we show in the paper.
|
42 |
+
|
43 |
+
## Factors
|
44 |
+
|
45 |
+
• Based on the known robustness issues of neural networks, the relevant factors include the lighting, contrast and
|
46 |
+
resolution of the video frames. The present of objects might also cause false detections and erroneous keypoints.
|
47 |
+
When two or more animals are extremely close, it could cause the top-down detectors to only detect only one animal,
|
48 |
+
if used without further fine-tuning or with a method such as BUCTD (Zhou et al. 2023 ICCV).
|
49 |
+
|
50 |
+
## Metrics
|
51 |
+
• Mean Average Precision (mAP)
|
52 |
+
|
53 |
+
## Evaluation Data
|
54 |
+
• In the paper we benchmark on AP-10K, AnimalPose, Horse-10, and iRodent using a leave-one-out strategy. Here,
|
55 |
+
we provide the model that has been trained on all datasets (see below), therefore it should be considered “fine-tuned"
|
56 |
+
on all animal training data listed below. This model is meant for production and evaluation in downstream scientific
|
57 |
+
applications.
|
58 |
+
|
59 |
+
## Training Data:
|
60 |
|
61 |
It consists of being trained together on the following datasets:
|
62 |
|
63 |
+
- **AwA-Pose** Quadruped dataset, see full details at (1).
|
64 |
+
- **AnimalPose** See full details at (2).
|
65 |
+
- **AcinoSet** See full details at (3).
|
66 |
+
- **Horse-30** Horse-30 dataset, benchmark task is called Horse-10; See full details at (4).
|
67 |
+
- **StanfordDogs** See full details at (5, 6).
|
68 |
+
- **AP-10K** See full details at (7).
|
69 |
- **iRodent** We utilized the iNaturalist API functions for scraping observations
|
70 |
+
with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the
|
71 |
ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are
|
72 |
Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid
|
73 |
Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse
|
74 |
(Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then
|
75 |
generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that
|
76 |
+
uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (8) model with a ResNet-50-FPN backbone (9),
|
77 |
+
pretrained on the COCO datasets (10). The processed 443 images were then manually labeled with both pose annotations and
|
78 |
+
segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
|
80 |
+
Here is an image with the keypoint guide:
|
81 |
<p align="center">
|
82 |
<img src="https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" width="95%">
|
83 |
</p>
|
84 |
|
85 |
|
86 |
+
## Ethical Considerations
|
87 |
+
|
88 |
+
• No experimental data was collected for this model; all datasets used are cited.
|
89 |
+
|
90 |
+
## Caveats and Recommendations
|
91 |
+
|
92 |
+
• The model may have reduced accuracy in scenarios with extremely varied lighting conditions or atypical animal
|
93 |
+
characteristics not well-represented in the training data.
|
94 |
+
|
95 |
+
• Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a
|
96 |
+
unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2023 for our Supplementary
|
97 |
+
Note on annotator bias).
|
98 |
+
|
99 |
+
• Note the dataset is highly diverse across species, but collectively has more
|
100 |
+
representation of domesticated animals like dogs, cats, horses, and cattle.
|
101 |
+
|
102 |
+
• We recommend if performance is not as
|
103 |
+
good as you need it to be, first try video adaptation (see Ye et al. 2023), or fine-tune these weights with your own
|
104 |
+
labeling.
|
105 |
+
|
106 |
+
## License
|
107 |
+
|
108 |
+
Modified MIT.
|
109 |
+
|
110 |
+
Copyright 2023 by Mackenzie Mathis, Shaokai Ye, and contributors.
|
111 |
+
|
112 |
+
Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive,
|
113 |
+
and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”)
|
114 |
+
to use the "MODEL" weights (hereafter "MODEL"), subject to the following conditions:
|
115 |
+
|
116 |
+
The above copyright notice and this permission notice shall be included in all copies or substantial
|
117 |
+
portions of the Software:
|
118 |
+
|
119 |
+
This software may not be used to harm any animal deliberately.
|
120 |
+
|
121 |
+
LICENSEE acknowledges that the MODEL is a research tool.
|
122 |
+
THE MODEL IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
|
123 |
+
BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
124 |
+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
125 |
+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL
|
126 |
+
OR THE USE OR OTHER DEALINGS IN THE MODEL.
|
127 |
+
|
128 |
+
If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis
|
129 |
+
(mackenzie@post.harvard.edu) and/or the TTO office at EPFL (tto@epfl.ch) for a commercial use license.
|
130 |
+
|
131 |
+
Please cite **Ye et al** if you use this model in your work https://arxiv.org/abs/2203.07436v2.
|
132 |
+
|
133 |
+
|
134 |
+
## References
|
135 |
+
|
136 |
+
1. Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021
|
137 |
+
2. Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation.
|
138 |
2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019.
|
139 |
+
3. Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, and Amir Patel. Acinoset:
|
140 |
A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation
|
141 |
(ICRA), pages 13901–13908, 2021.
|
142 |
+
4. Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, and Mackenzie W Mathis. Pretraining
|
143 |
boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,
|
144 |
pages 1859–1868, 2021.
|
145 |
+
5. Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop
|
146 |
on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.
|
147 |
+
6. Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, and Roberto Cipolla. Creatures great and smal: Recovering the shape and motion of
|
148 |
animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018.
|
149 |
+
7. Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. Ap-10k: A benchmark for animal pose estimation in the wild. In Thirty-fifth
|
150 |
Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
|
151 |
+
8. iNaturalist. OGBIF Occurrence Download. https://doi.org/10.15468/dl.p7nbxt. iNaturalist, July 2020
|
152 |
+
9. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer
|
153 |
vision, pages 2961–2969, 2017.
|
154 |
+
10. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016.
|
155 |
+
11. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar,
|
156 |
+
and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014
|
|