littleworth commited on
Commit
b66c778
1 Parent(s): 117d1f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -15,10 +15,13 @@ This model card describes the distilled version of ProtGPT2, referred to as `pro
15
  **Dataset Used:**
16
  - The model was distilled using a subset of the evaluation dataset provided by `nferruz/UR50_2021_04`.
17
 
18
- **Loss Formulation:**
19
- - **Soft Loss:** \( L_{soft} = \text{KL}(\text{softmax}(\frac{s}{T}), \text{softmax}(\frac{t}{T})) \)
20
- - **Hard Loss:** \( L_{hard} = -\sum_{i} y_i \log(\text{softmax}(s_i)) \)
21
- - **Combined Loss:** \( L = \alpha L_{hard} + (1 - \alpha) L_{soft} \)
 
 
 
22
 
23
  ### Performance
24
  The distilled model, `protgpt2-distilled-tiny`, exhibits a significant improvement in inference speed—up to 6 times faster than the pretrained version—while maintaining comparable perplexities.
 
15
  **Dataset Used:**
16
  - The model was distilled using a subset of the evaluation dataset provided by `nferruz/UR50_2021_04`.
17
 
18
+ <strong>Loss Formulation:</strong>
19
+ <ul>
20
+ <li><strong>Soft Loss:</strong> <span>&#x2112;<sub>soft</sub> = KL(softmax(s/T), softmax(t/T))</span></li>
21
+ <li><strong>Hard Loss:</strong> <span>&#x2112;<sub>hard</sub> = -∑<sub>i</sub> y<sub>i</sub> log(softmax(s<sub>i</sub>))</span></li>
22
+ <li><strong>Combined Loss:</strong> <span>&#x2112; = α &#x2112;<sub>hard</sub> + (1 - α) &#x2112;<sub>soft</sub></span></li>
23
+ </ul>
24
+
25
 
26
  ### Performance
27
  The distilled model, `protgpt2-distilled-tiny`, exhibits a significant improvement in inference speed—up to 6 times faster than the pretrained version—while maintaining comparable perplexities.