Update README.md
Browse files
README.md
CHANGED
|
@@ -8,17 +8,19 @@ tags:
|
|
| 8 |
- storywriting
|
| 9 |
---
|
| 10 |
|
|
|
|
| 11 |
<div style="width: 100%;">
|
| 12 |
<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
|
| 13 |
</div>
|
| 14 |
<div style="display: flex; justify-content: space-between; width: 100%;">
|
| 15 |
<div style="display: flex; flex-direction: column; align-items: flex-start;">
|
| 16 |
-
<p><a href="https://discord.gg/
|
| 17 |
</div>
|
| 18 |
<div style="display: flex; flex-direction: column; align-items: flex-end;">
|
| 19 |
-
<p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? Patreon
|
| 20 |
</div>
|
| 21 |
</div>
|
|
|
|
| 22 |
|
| 23 |
# Elinas' Chronos 13B GGML
|
| 24 |
|
|
@@ -71,17 +73,30 @@ If you want to have a chat-style conversation, replace the `-p <PROMPT>` argumen
|
|
| 71 |
|
| 72 |
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
|
| 73 |
|
| 74 |
-
|
|
|
|
| 75 |
|
| 76 |
-
|
| 77 |
|
| 78 |
-
|
| 79 |
|
| 80 |
-
|
| 81 |
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
* Ko-Fi: https://ko-fi.com/TheBlokeAI
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
# Original model card: Chronos 13B
|
| 87 |
|
|
@@ -172,11 +187,11 @@ Hyperparameters for the model architecture
|
|
| 172 |
</tr>
|
| 173 |
<tr>
|
| 174 |
<th>Number of parameters</th><th>dimension</th><th>n heads</th><th>n layers</th><th>Learn rate</th><th>Batch size</th><th>n tokens</th>
|
| 175 |
-
</tr>
|
| 176 |
</thead>
|
| 177 |
-
<tbody>
|
| 178 |
<tr>
|
| 179 |
-
<th>7B</th> <th>4096</th> <th>32</th> <th>32</th> <th>3.0E-04</th><th>4M</th><th>1T
|
| 180 |
</tr>
|
| 181 |
<tr>
|
| 182 |
<th>13B</th><th>5120</th><th>40</th><th>40</th><th>3.0E-04</th><th>4M</th><th>1T
|
|
@@ -186,13 +201,13 @@ Hyperparameters for the model architecture
|
|
| 186 |
</tr>
|
| 187 |
<tr>
|
| 188 |
<th>65B</th><th>8192</th><th>64</th><th>80</th><th>1.5.E-04</th><th>4M</th><th>1.4T
|
| 189 |
-
</tr>
|
| 190 |
</tbody>
|
| 191 |
</table>
|
| 192 |
|
| 193 |
*Table 1 - Summary of LLama Model Hyperparameters*
|
| 194 |
|
| 195 |
-
We present our results on eight standard common sense reasoning benchmarks in the table below.
|
| 196 |
<table>
|
| 197 |
<thead>
|
| 198 |
<tr>
|
|
@@ -200,23 +215,23 @@ We present our results on eight standard common sense reasoning benchmarks in th
|
|
| 200 |
</tr>
|
| 201 |
<tr>
|
| 202 |
<th>Number of parameters</th> <th>BoolQ</th><th>PIQA</th><th>SIQA</th><th>HellaSwag</th><th>WinoGrande</th><th>ARC-e</th><th>ARC-c</th><th>OBQA</th><th>COPA</th>
|
| 203 |
-
</tr>
|
| 204 |
</thead>
|
| 205 |
-
<tbody>
|
| 206 |
-
<tr>
|
| 207 |
<th>7B</th><th>76.5</th><th>79.8</th><th>48.9</th><th>76.1</th><th>70.1</th><th>76.7</th><th>47.6</th><th>57.2</th><th>93
|
| 208 |
-
</th>
|
| 209 |
<tr><th>13B</th><th>78.1</th><th>80.1</th><th>50.4</th><th>79.2</th><th>73</th><th>78.1</th><th>52.7</th><th>56.4</th><th>94
|
| 210 |
</th>
|
| 211 |
<tr><th>33B</th><th>83.1</th><th>82.3</th><th>50.4</th><th>82.8</th><th>76</th><th>81.4</th><th>57.8</th><th>58.6</th><th>92
|
| 212 |
</th>
|
| 213 |
-
<tr><th>65B</th><th>85.3</th><th>82.8</th><th>52.3</th><th>84.2</th><th>77</th><th>81.5</th><th>56</th><th>60.2</th><th>94</th></tr>
|
| 214 |
</tbody>
|
| 215 |
</table>
|
| 216 |
*Table 2 - Summary of LLama Model Performance on Reasoning tasks*
|
| 217 |
|
| 218 |
|
| 219 |
-
We present our results on bias in the table below. Note that lower value is better indicating lower bias.
|
| 220 |
|
| 221 |
|
| 222 |
| No | Category | FAIR LLM |
|
|
@@ -250,4 +265,4 @@ We filtered the data from the Web based on its proximity to Wikipedia text and r
|
|
| 250 |
Risks and harms of large language models include the generation of harmful, offensive or biased content. These models are often prone to generating incorrect information, sometimes referred to as hallucinations. We do not expect our model to be an exception in this regard.
|
| 251 |
|
| 252 |
**Use cases**
|
| 253 |
-
LLaMA is a foundational model, and as such, it should not be used for downstream applications without further investigation and mitigations of risks. These risks and potential fraught use cases include, but are not limited to: generation of misinformation and generation of harmful, biased or offensive content.
|
|
|
|
| 8 |
- storywriting
|
| 9 |
---
|
| 10 |
|
| 11 |
+
<!-- header start -->
|
| 12 |
<div style="width: 100%;">
|
| 13 |
<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
|
| 14 |
</div>
|
| 15 |
<div style="display: flex; justify-content: space-between; width: 100%;">
|
| 16 |
<div style="display: flex; flex-direction: column; align-items: flex-start;">
|
| 17 |
+
<p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
|
| 18 |
</div>
|
| 19 |
<div style="display: flex; flex-direction: column; align-items: flex-end;">
|
| 20 |
+
<p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
|
| 21 |
</div>
|
| 22 |
</div>
|
| 23 |
+
<!-- header end -->
|
| 24 |
|
| 25 |
# Elinas' Chronos 13B GGML
|
| 26 |
|
|
|
|
| 73 |
|
| 74 |
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
|
| 75 |
|
| 76 |
+
<!-- footer start -->
|
| 77 |
+
## Discord
|
| 78 |
|
| 79 |
+
For further support, and discussions on these models and AI in general, join us at:
|
| 80 |
|
| 81 |
+
[TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
|
| 82 |
|
| 83 |
+
## Thanks, and how to contribute.
|
| 84 |
|
| 85 |
+
Thanks to the [chirper.ai](https://chirper.ai) team!
|
| 86 |
+
|
| 87 |
+
I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
|
| 88 |
+
|
| 89 |
+
If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
|
| 90 |
+
|
| 91 |
+
Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
|
| 92 |
+
|
| 93 |
+
* Patreon: https://patreon.com/TheBlokeAI
|
| 94 |
* Ko-Fi: https://ko-fi.com/TheBlokeAI
|
| 95 |
+
|
| 96 |
+
**Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.
|
| 97 |
+
|
| 98 |
+
Thank you to all my generous patrons and donaters!
|
| 99 |
+
<!-- footer end -->
|
| 100 |
|
| 101 |
# Original model card: Chronos 13B
|
| 102 |
|
|
|
|
| 187 |
</tr>
|
| 188 |
<tr>
|
| 189 |
<th>Number of parameters</th><th>dimension</th><th>n heads</th><th>n layers</th><th>Learn rate</th><th>Batch size</th><th>n tokens</th>
|
| 190 |
+
</tr>
|
| 191 |
</thead>
|
| 192 |
+
<tbody>
|
| 193 |
<tr>
|
| 194 |
+
<th>7B</th> <th>4096</th> <th>32</th> <th>32</th> <th>3.0E-04</th><th>4M</th><th>1T
|
| 195 |
</tr>
|
| 196 |
<tr>
|
| 197 |
<th>13B</th><th>5120</th><th>40</th><th>40</th><th>3.0E-04</th><th>4M</th><th>1T
|
|
|
|
| 201 |
</tr>
|
| 202 |
<tr>
|
| 203 |
<th>65B</th><th>8192</th><th>64</th><th>80</th><th>1.5.E-04</th><th>4M</th><th>1.4T
|
| 204 |
+
</tr>
|
| 205 |
</tbody>
|
| 206 |
</table>
|
| 207 |
|
| 208 |
*Table 1 - Summary of LLama Model Hyperparameters*
|
| 209 |
|
| 210 |
+
We present our results on eight standard common sense reasoning benchmarks in the table below.
|
| 211 |
<table>
|
| 212 |
<thead>
|
| 213 |
<tr>
|
|
|
|
| 215 |
</tr>
|
| 216 |
<tr>
|
| 217 |
<th>Number of parameters</th> <th>BoolQ</th><th>PIQA</th><th>SIQA</th><th>HellaSwag</th><th>WinoGrande</th><th>ARC-e</th><th>ARC-c</th><th>OBQA</th><th>COPA</th>
|
| 218 |
+
</tr>
|
| 219 |
</thead>
|
| 220 |
+
<tbody>
|
| 221 |
+
<tr>
|
| 222 |
<th>7B</th><th>76.5</th><th>79.8</th><th>48.9</th><th>76.1</th><th>70.1</th><th>76.7</th><th>47.6</th><th>57.2</th><th>93
|
| 223 |
+
</th>
|
| 224 |
<tr><th>13B</th><th>78.1</th><th>80.1</th><th>50.4</th><th>79.2</th><th>73</th><th>78.1</th><th>52.7</th><th>56.4</th><th>94
|
| 225 |
</th>
|
| 226 |
<tr><th>33B</th><th>83.1</th><th>82.3</th><th>50.4</th><th>82.8</th><th>76</th><th>81.4</th><th>57.8</th><th>58.6</th><th>92
|
| 227 |
</th>
|
| 228 |
+
<tr><th>65B</th><th>85.3</th><th>82.8</th><th>52.3</th><th>84.2</th><th>77</th><th>81.5</th><th>56</th><th>60.2</th><th>94</th></tr>
|
| 229 |
</tbody>
|
| 230 |
</table>
|
| 231 |
*Table 2 - Summary of LLama Model Performance on Reasoning tasks*
|
| 232 |
|
| 233 |
|
| 234 |
+
We present our results on bias in the table below. Note that lower value is better indicating lower bias.
|
| 235 |
|
| 236 |
|
| 237 |
| No | Category | FAIR LLM |
|
|
|
|
| 265 |
Risks and harms of large language models include the generation of harmful, offensive or biased content. These models are often prone to generating incorrect information, sometimes referred to as hallucinations. We do not expect our model to be an exception in this regard.
|
| 266 |
|
| 267 |
**Use cases**
|
| 268 |
+
LLaMA is a foundational model, and as such, it should not be used for downstream applications without further investigation and mitigations of risks. These risks and potential fraught use cases include, but are not limited to: generation of misinformation and generation of harmful, biased or offensive content.
|