arnabdhar commited on
Commit
e70b3ca
1 Parent(s): 8868ce7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -6
README.md CHANGED
@@ -14,6 +14,13 @@ metrics:
14
  model-index:
15
  - name: Swin-V2-base-Food
16
  results: []
 
 
 
 
 
 
 
17
  ---
18
 
19
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -31,18 +38,42 @@ It achieves the following results on the evaluation set:
31
 
32
  ## Model description
33
 
34
- More information needed
35
 
36
- ## Intended uses & limitations
 
 
37
 
38
- More information needed
39
 
40
- ## Training and evaluation data
41
 
42
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## Training procedure
45
 
 
 
46
  ### Training hyperparameters
47
 
48
  The following hyperparameters were used during training:
@@ -76,4 +107,4 @@ The following hyperparameters were used during training:
76
  - Transformers 4.35.2
77
  - Pytorch 2.1.0+cu121
78
  - Datasets 2.15.0
79
- - Tokenizers 0.15.0
 
14
  model-index:
15
  - name: Swin-V2-base-Food
16
  results: []
17
+ datasets:
18
+ - ItsNotRohit/Food121-224
19
+ - food101
20
+ language:
21
+ - en
22
+ library_name: transformers
23
+ pipeline_tag: image-classification
24
  ---
25
 
26
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
38
 
39
  ## Model description
40
 
41
+ Swin v2 is a powerful vision model based on Transformers, achieving top-notch accuracy in image classification tasks. It excels thanks to:
42
 
43
+ - __Hierarchical architecture__: Efficiently captures features at different scales, like CNNs.
44
+ - __Shifted windows__: Improves information flow and reduces computational cost.
45
+ - __Large model capacity__: Enables accurate and generalizable predictions.
46
 
47
+ Swin v2 sets new records on ImageNet, even needing 40x less data and training time than similar models. It's also versatile, tackling various vision tasks and handling large images.
48
 
49
+ The model was fine tuned on a 120 categories of food images.
50
 
51
+ To use the model use the following code snippet:
52
+
53
+ ```python
54
+ from transformers import pipeline
55
+ from PIL import Image
56
+
57
+ # init image classification pipeline
58
+ classifier = pipeline("image-classification", "arnabdhar/Swin-V2-base-Food")
59
+
60
+ # use pipeline for inference
61
+ image = Image.open(image_path)
62
+ results = classifier(image)
63
+ ```
64
+
65
+ ## Intended uses
66
+
67
+ The model can be used for the following tasks:
68
+
69
+ - __Food Image Classification__: Use this model to classify food images using the Transformers `pipeline` module.
70
+ - __Base Model for Fine Tuning__: If you want to use this model for your own custom dataset you can surely do so by treating this model as a base model and fine tune it for your own dataset.
71
+
72
 
73
  ## Training procedure
74
 
75
+ The fine tuning was done on Google Colab with a NVIDIA T4 GPU with 15GB of VRAM, the model was trained for 20,000 steps and it took ~5.5 hours for the fine tuning to complete which also included periodic evaluation of the model.
76
+
77
  ### Training hyperparameters
78
 
79
  The following hyperparameters were used during training:
 
107
  - Transformers 4.35.2
108
  - Pytorch 2.1.0+cu121
109
  - Datasets 2.15.0
110
+ - Tokenizers 0.15.0