STEM-AI-mtl commited on
Commit
9429ac5
1 Parent(s): ca46bec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -32
README.md CHANGED
@@ -21,7 +21,7 @@ widget:
21
  example_title: Palace
22
  ---
23
 
24
- # The fine-tuned ViT model that beats [Google's base model](https://huggingface.co/google/vit-base-patch16-224)
25
 
26
  Image-classification model that identifies which city map is illustrated from an image input.
27
 
@@ -65,40 +65,20 @@ This [Google's ViT-base-patch16-224](https://huggingface.co/google/vit-base-patc
65
 
66
  ## Training procedure
67
 
68
- ### Preprocessing
69
 
70
- The exact details of preprocessing of images during training/validation can be found [here](https://github.com/google-research/vision_transformer/blob/master/vit_jax/input_pipeline.py).
71
 
72
- Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).
73
 
74
- ### Pretraining
 
 
 
 
 
75
 
76
- The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Training resolution is 224.
77
 
78
- ## Evaluation results
79
 
80
- For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance.
81
-
82
- ### BibTeX entry and citation info
83
-
84
- ```bibtex
85
- @misc{wu2020visual,
86
- title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision},
87
- author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
88
- year={2020},
89
- eprint={2006.03677},
90
- archivePrefix={arXiv},
91
- primaryClass={cs.CV}
92
- }
93
- ```
94
-
95
- ```bibtex
96
- @inproceedings{deng2009imagenet,
97
- title={Imagenet: A large-scale hierarchical image database},
98
- author={Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li},
99
- booktitle={2009 IEEE conference on computer vision and pattern recognition},
100
- pages={248--255},
101
- year={2009},
102
- organization={Ieee}
103
- }
104
- ```
 
21
  example_title: Palace
22
  ---
23
 
24
+ # The fine-tuned ViT model that beats [Google's base model](https://huggingface.co/google/vit-base-patch16-224) and OpenAI's GPT4
25
 
26
  Image-classification model that identifies which city map is illustrated from an image input.
27
 
 
65
 
66
  ## Training procedure
67
 
 
68
 
69
+ ## Training evaluation results
70
 
71
+ The quality of the training was evaluated with the training dataset and resulted in the following metrics: \
72
 
73
+ {'eval_loss': 1.3691096305847168,
74
+ 'eval_accuracy': 0.6666666666666666,
75
+ 'eval_runtime': 13.0277,
76
+ 'eval_samples_per_second': 4.606,
77
+ 'eval_steps_per_second': 0.154,
78
+ 'epoch': 2.82}
79
 
 
80
 
81
+ ## Model Card Authors
82
 
83
+ STEM.AI: stem.ai.mtl@gmail.com\
84
+ [William Harbec](https://www.linkedin.com/in/william-harbec-56a262248/)