xueyunlong commited on
Commit
9736f90
·
verified ·
1 Parent(s): 73b031e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -16
README.md CHANGED
@@ -4,7 +4,7 @@ tags:
4
  - biology
5
  ---
6
  <div align="center">
7
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65a9e8563b9e1f0f308378b7/H2qI2OOSl-KqOlg01fRGR.png" width="80%" />
8
  </div>
9
 
10
  # OneGenomeRice (OGR)
@@ -15,18 +15,67 @@ For instructions, details, and examples, see the project repository [OGR GitHub]
15
 
16
  The table below summarizes training scale and key hyperparameters.
17
 
18
- | Model Specification | OGR |
19
- | --- | --- |
20
- | **Model Scale** | |
21
- | Total Parameters | 1.25B |
22
- | Activated Parameters | 0.33B |
23
- | **Architecture** | |
24
- | Architecture | MoE |
25
- | Number of Experts | 8 |
26
- | Selected Experts per Token | 2 |
27
- | Number of Layers | 12 |
28
- | Attention Hidden Dimension | 1024 |
29
- | Number of Attention Heads | 16 (GQA, 8 KV groups) |
30
- | MoE Hidden Dimension (per Expert) | 4096 |
31
- | Vocabulary Size | 128 (padded) |
32
- | Context Length | up to 1M |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - biology
5
  ---
6
  <div align="center">
7
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65a9e8563b9e1f0f308378b7/H2qI2OOSl-KqOlg01fRGR.png" width="50%" />
8
  </div>
9
 
10
  # OneGenomeRice (OGR)
 
15
 
16
  The table below summarizes training scale and key hyperparameters.
17
 
18
+ <div align="center">
19
+
20
+ <table>
21
+ <thead>
22
+ <tr>
23
+ <th align="center"><strong>Model Specification</strong></th>
24
+ <th align="center"><strong>OneGenomeRice (OGR)</strong></th>
25
+ </tr>
26
+ </thead>
27
+ <tbody>
28
+ <tr>
29
+ <td align="center" colspan="2"><strong>Model Scale</strong></td>
30
+ </tr>
31
+ <tr>
32
+ <td align="center">Total Parameters</td>
33
+ <td align="center">1.25B</td>
34
+ </tr>
35
+ <tr>
36
+ <td align="center">Activated Parameters</td>
37
+ <td align="center">0.33B</td>
38
+ </tr>
39
+ <tr>
40
+ <td align="center" colspan="2"><strong>Architecture</strong></td>
41
+ </tr>
42
+ <tr>
43
+ <td align="center">Architecture</td>
44
+ <td align="center">MoE</td>
45
+ </tr>
46
+ <tr>
47
+ <td align="center">Number of Experts</td>
48
+ <td align="center">8</td>
49
+ </tr>
50
+ <tr>
51
+ <td align="center">Selected Experts per Token</td>
52
+ <td align="center">2</td>
53
+ </tr>
54
+ <tr>
55
+ <td align="center">Number of Layers</td>
56
+ <td align="center">12</td>
57
+ </tr>
58
+ <tr>
59
+ <td align="center">Attention Hidden Dimension</td>
60
+ <td align="center">1024</td>
61
+ </tr>
62
+ <tr>
63
+ <td align="center">Number of Attention Heads</td>
64
+ <td align="center">16 (GQA, 8 KV groups)</td>
65
+ </tr>
66
+ <tr>
67
+ <td align="center">MoE Hidden Dimension (per Expert)</td>
68
+ <td align="center">4096</td>
69
+ </tr>
70
+ <tr>
71
+ <td align="center">Vocabulary Size</td>
72
+ <td align="center">128 (padded)</td>
73
+ </tr>
74
+ <tr>
75
+ <td align="center">Context Length</td>
76
+ <td align="center">up to 1Mb</td>
77
+ </tr>
78
+ </tbody>
79
+ </table>
80
+
81
+ </div>