PeymanHosseini commited on
Commit
824a209
1 Parent(s): 5c02206

Update README.md

Browse files

Adds Figures to the model card

Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -23,6 +23,11 @@ This is Hummingbird 0.0, a 1B proof-of-concept causal language model based on **
23
 
24
  This version of Hummingbird is only meant to demonstrate Efficient Attention for use in causal language modelling. It has been trained on only 15 Billion tokens and is not safeguarded. Therefore, we do not recommend using it as a chatbot.
25
 
 
 
 
 
 
26
  ## Model Details
27
 
28
  The model consists of 1.1 Billion parameters with the following specifications:
@@ -36,6 +41,12 @@ The model consists of 1.1 Billion parameters with the following specifications:
36
 
37
  The Attention Mechanism used is based on our newly proposed Efficient Attention from our paper, *You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism* ([arXiv:2403.01643](https://arxiv.org/abs/2403.01643)). We have chosen the number of heads to be 1 as an interesting case study since all current LMs use multiple heads.
38
 
 
 
 
 
 
 
39
  If you use Efficient Attention or Hummingbird, please cite our paper:
40
 
41
  ```
 
23
 
24
  This version of Hummingbird is only meant to demonstrate Efficient Attention for use in causal language modelling. It has been trained on only 15 Billion tokens and is not safeguarded. Therefore, we do not recommend using it as a chatbot.
25
 
26
+ <div align="center">
27
+ <img src="figs/Hummingbird.jpg" width="400"/>
28
+ </div>
29
+
30
+
31
  ## Model Details
32
 
33
  The model consists of 1.1 Billion parameters with the following specifications:
 
41
 
42
  The Attention Mechanism used is based on our newly proposed Efficient Attention from our paper, *You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism* ([arXiv:2403.01643](https://arxiv.org/abs/2403.01643)). We have chosen the number of heads to be 1 as an interesting case study since all current LMs use multiple heads.
43
 
44
+ The loss plot below illustrates the model's performance during training. For comparison, when trained on 15 billion tokens, Hummingbird achieves a slightly lower loss than TinyLlama, a model of similar size.
45
+ <div align="center">
46
+ <img src="figs/history.png" width="700"/>
47
+ </div>
48
+
49
+
50
  If you use Efficient Attention or Hummingbird, please cite our paper:
51
 
52
  ```