Commit
•
9429ac5
1
Parent(s):
ca46bec
Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ widget:
|
|
21 |
example_title: Palace
|
22 |
---
|
23 |
|
24 |
-
# The fine-tuned ViT model that beats [Google's base model](https://huggingface.co/google/vit-base-patch16-224)
|
25 |
|
26 |
Image-classification model that identifies which city map is illustrated from an image input.
|
27 |
|
@@ -65,40 +65,20 @@ This [Google's ViT-base-patch16-224](https://huggingface.co/google/vit-base-patc
|
|
65 |
|
66 |
## Training procedure
|
67 |
|
68 |
-
### Preprocessing
|
69 |
|
70 |
-
|
71 |
|
72 |
-
|
73 |
|
74 |
-
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
-
The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Training resolution is 224.
|
77 |
|
78 |
-
##
|
79 |
|
80 |
-
|
81 |
-
|
82 |
-
### BibTeX entry and citation info
|
83 |
-
|
84 |
-
```bibtex
|
85 |
-
@misc{wu2020visual,
|
86 |
-
title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision},
|
87 |
-
author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
|
88 |
-
year={2020},
|
89 |
-
eprint={2006.03677},
|
90 |
-
archivePrefix={arXiv},
|
91 |
-
primaryClass={cs.CV}
|
92 |
-
}
|
93 |
-
```
|
94 |
-
|
95 |
-
```bibtex
|
96 |
-
@inproceedings{deng2009imagenet,
|
97 |
-
title={Imagenet: A large-scale hierarchical image database},
|
98 |
-
author={Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li},
|
99 |
-
booktitle={2009 IEEE conference on computer vision and pattern recognition},
|
100 |
-
pages={248--255},
|
101 |
-
year={2009},
|
102 |
-
organization={Ieee}
|
103 |
-
}
|
104 |
-
```
|
|
|
21 |
example_title: Palace
|
22 |
---
|
23 |
|
24 |
+
# The fine-tuned ViT model that beats [Google's base model](https://huggingface.co/google/vit-base-patch16-224) and OpenAI's GPT4
|
25 |
|
26 |
Image-classification model that identifies which city map is illustrated from an image input.
|
27 |
|
|
|
65 |
|
66 |
## Training procedure
|
67 |
|
|
|
68 |
|
69 |
+
## Training evaluation results
|
70 |
|
71 |
+
The quality of the training was evaluated with the training dataset and resulted in the following metrics: \
|
72 |
|
73 |
+
{'eval_loss': 1.3691096305847168,
|
74 |
+
'eval_accuracy': 0.6666666666666666,
|
75 |
+
'eval_runtime': 13.0277,
|
76 |
+
'eval_samples_per_second': 4.606,
|
77 |
+
'eval_steps_per_second': 0.154,
|
78 |
+
'epoch': 2.82}
|
79 |
|
|
|
80 |
|
81 |
+
## Model Card Authors
|
82 |
|
83 |
+
STEM.AI: stem.ai.mtl@gmail.com\
|
84 |
+
[William Harbec](https://www.linkedin.com/in/william-harbec-56a262248/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|