tarekziade commited on
Commit
66e7bff
1 Parent(s): 7756182

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +47 -48
README.md CHANGED
@@ -1,63 +1,63 @@
1
  ---
2
  tags:
3
- - image-to-text
4
- - image-captioning
5
  license: apache-2.0
6
  metrics:
7
- - rouge
8
  datasets:
9
- - nlphuji/flickr30k
10
  widget:
11
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
12
- example_title: Savanna
13
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
14
- example_title: Football Match
15
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
16
- example_title: Airport
17
- base_model:
18
- - google/vit-base-patch16-224-in21k
19
 
20
  model-index:
21
- - name: mozilla/distilvit
22
- results:
23
- - task:
24
- type: image-to-text
25
- name: Image To Text
26
- dataset:
27
- name: nlphuji/flickr30k
28
- type: nlphuji/flickr30k
29
- metrics:
30
- - name: ROUGE-1
31
- type: rouge
32
- value: 43.006
33
- verified: true
34
- - name: ROUGE-2
35
- type: rouge
36
- value: 16.9939
37
- verified: true
38
- - name: ROUGE-L
39
- type: rouge
40
- value: 38.8923
41
- verified: true
42
- - name: ROUGE-LSUM
43
- type: rouge
44
- value: 38.8877
45
- verified: true
46
- - name: loss
47
- type: loss
48
- value: 0.19939416646957397
49
- - name: gen_len
50
- type: gen_len
51
- value: 11.327256736227712
52
- verified: true
53
  ---
54
 
55
  # distilvit
56
 
57
- This model is a work in progress. Fine-tuned version of those base models:
58
 
59
- - a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k
60
- - a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2
61
 
62
  This model was trained on:
63
 
@@ -73,7 +73,6 @@ It was then further fine-tuned on :
73
 
74
  You can find the code used to create the model here: https://github.com/mozilla/distilvit
75
 
76
-
77
  ### Framework versions
78
 
79
  - Transformers 4.40.2
 
1
  ---
2
  tags:
3
+ - image-to-text
4
+ - image-captioning
5
  license: apache-2.0
6
  metrics:
7
+ - rouge
8
  datasets:
9
+ - nlphuji/flickr30k
10
  widget:
11
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
12
+ example_title: Savanna
13
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
14
+ example_title: Football Match
15
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
16
+ example_title: Airport
17
+ base_model:
18
+ - google/vit-base-patch16-224-in21k
19
 
20
  model-index:
21
+ - name: mozilla/distilvit
22
+ results:
23
+ - task:
24
+ type: image-to-text
25
+ name: Image To Text
26
+ dataset:
27
+ name: nlphuji/flickr30k
28
+ type: nlphuji/flickr30k
29
+ metrics:
30
+ - name: ROUGE-1
31
+ type: rouge
32
+ value: 43.006
33
+ verified: true
34
+ - name: ROUGE-2
35
+ type: rouge
36
+ value: 16.9939
37
+ verified: true
38
+ - name: ROUGE-L
39
+ type: rouge
40
+ value: 38.8923
41
+ verified: true
42
+ - name: ROUGE-LSUM
43
+ type: rouge
44
+ value: 38.8877
45
+ verified: true
46
+ - name: loss
47
+ type: loss
48
+ value: 0.19939416646957397
49
+ - name: gen_len
50
+ type: gen_len
51
+ value: 11.327256736227712
52
+ verified: true
53
  ---
54
 
55
  # distilvit
56
 
57
+ This model is a work in progress. Fine-tuned version of those base models:
58
 
59
+ - a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k
60
+ - a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2
61
 
62
  This model was trained on:
63
 
 
73
 
74
  You can find the code used to create the model here: https://github.com/mozilla/distilvit
75
 
 
76
  ### Framework versions
77
 
78
  - Transformers 4.40.2