johko commited on
Commit
6865b8c
1 Parent(s): 319dcd9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -2
README.md CHANGED
@@ -1,12 +1,20 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
4
 
5
  # CapDec - NoiseLevel: 0.015
6
 
7
- This is are model weights originally provided by the authors of the paper [Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
8
 
9
- Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise with a standard-deviation(STD) of into the text embeddings before decoding.
10
 
11
  In their words:
12
  *Specifically, we assume that the visual embedding corresponding to a text embedding
@@ -20,3 +28,7 @@ The "Noise Level" of 0.015 is equivalent to the Noise Variance which is the squa
20
  The reported metrics are results of a model with a Noise Variance of 0.016, which the authors unfortunately do not provide in their repository.
21
  This model with a Noise Variance 0.015 is the closest available pre-trained model to their best model.
22
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: image-to-text
6
+ datasets:
7
+ - MS-COCO
8
+ - Flickr30k
9
+ tags:
10
+ - Image Captioning
11
  ---
12
 
13
  # CapDec - NoiseLevel: 0.015
14
 
15
+ This are model weights originally provided by the authors of the paper [Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
16
 
17
+ Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.
18
 
19
  In their words:
20
  *Specifically, we assume that the visual embedding corresponding to a text embedding
 
28
  The reported metrics are results of a model with a Noise Variance of 0.016, which the authors unfortunately do not provide in their repository.
29
  This model with a Noise Variance 0.015 is the closest available pre-trained model to their best model.
30
 
31
+
32
+ ## Performance
33
+ The authors don't explicitly report the performance for this NoiseLevel but it can be estimated from the following figure from the original paper:
34
+