chiyuzhang
commited on
Commit
•
944519f
1
Parent(s):
7256c50
Update README.md
Browse files
README.md
CHANGED
@@ -41,6 +41,88 @@ The following hyperparameters were used during training:
|
|
41 |
## Training and evaluation data
|
42 |
We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
## Use
|
46 |
|
|
|
41 |
## Training and evaluation data
|
42 |
We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
|
43 |
|
44 |
+
## Model Models
|
45 |
+
You can download LaMini model series as follow. Note that not all models are performing as well. Models with ✩ are those with the best overall performance given their size/architecture. More details can be seen in our paper.
|
46 |
+
|
47 |
+
<table>
|
48 |
+
<caption>
|
49 |
+
LaMini Language Models collection.
|
50 |
+
</caption>
|
51 |
+
<thead>
|
52 |
+
<tr>
|
53 |
+
<th>Name</th>
|
54 |
+
<th>Architecture</th>
|
55 |
+
<th>Initialization</th>
|
56 |
+
</tr>
|
57 |
+
</thead>
|
58 |
+
<tbody>
|
59 |
+
<tr>
|
60 |
+
<td>LaMini-T5-61M</td>
|
61 |
+
<td>encoder-decoder</td>
|
62 |
+
<td>T5-small</td>
|
63 |
+
</tr>
|
64 |
+
<tr>
|
65 |
+
<td>LaMini-T5-223M</td>
|
66 |
+
<td>encoder-decoder</td>
|
67 |
+
<td>T5-base</td>
|
68 |
+
</tr>
|
69 |
+
<tr>
|
70 |
+
<td>LaMini-T5-738M</td>
|
71 |
+
<td>encoder-decoder</td>
|
72 |
+
<td>T5-large</td>
|
73 |
+
</tr>
|
74 |
+
<tr>
|
75 |
+
<td>LaMini-Flan-T5-77M</td>
|
76 |
+
<td>encoder-decoder</td>
|
77 |
+
<td>Flan-T5-small</td>
|
78 |
+
</tr>
|
79 |
+
<tr>
|
80 |
+
<td>LaMini-Flan-T5-248M</td>
|
81 |
+
<td>encoder-decoder</td>
|
82 |
+
<td>Flan-T5-base</td>
|
83 |
+
</tr>
|
84 |
+
<tr>
|
85 |
+
<td>LaMini-Flan-T5-783M</td>
|
86 |
+
<td>encoder-decoder</td>
|
87 |
+
<td>Flan-T5-large</td>
|
88 |
+
</tr>
|
89 |
+
<tr>
|
90 |
+
<td>LaMini-Cb-111M</td>
|
91 |
+
<td>decoder-only</td>
|
92 |
+
<td>Cerebras-GPT-111M</td>
|
93 |
+
</tr>
|
94 |
+
<tr>
|
95 |
+
<td>LaMini-Cb-256M</td>
|
96 |
+
<td>decoder-only</td>
|
97 |
+
<td>Cerebras-GPT-256M</td>
|
98 |
+
</tr>
|
99 |
+
<tr>
|
100 |
+
<td>LaMini-Cb-590M</td>
|
101 |
+
<td>decoder-only</td>
|
102 |
+
<td>Cerebras-GPT-590M</td>
|
103 |
+
</tr>
|
104 |
+
<tr>
|
105 |
+
<td>LaMini-Cb-1.3B</td>
|
106 |
+
<td>decoder-only</td>
|
107 |
+
<td>Cerebras-GPT-1.3B</td>
|
108 |
+
</tr>
|
109 |
+
<tr>
|
110 |
+
<td>LaMini-GPT-124M</td>
|
111 |
+
<td>decoder-only</td>
|
112 |
+
<td>GPT-2</td>
|
113 |
+
</tr>
|
114 |
+
<tr>
|
115 |
+
<td>LaMini-GPT-774M</td>
|
116 |
+
<td>decoder-only</td>
|
117 |
+
<td>GPT-2 large</td>
|
118 |
+
</tr>
|
119 |
+
<tr>
|
120 |
+
<td>LaMini-GPT-1.5B</td>
|
121 |
+
<td>decoder-only</td>
|
122 |
+
<td>GPT-2 xl</td>
|
123 |
+
</tr>
|
124 |
+
</tbody>
|
125 |
+
</table>
|
126 |
|
127 |
## Use
|
128 |
|