Update README.md
Browse files
README.md
CHANGED
@@ -28,6 +28,12 @@ tags:
|
|
28 |
---
|
29 |
# Model Card for NinjaMouse-2.4B-32L-danube
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
A lanky version of [h2o-danube](https://huggingface.co/h2oai/h2o-danube-1.8b-chat)'s tiny language model, stretched from 24 layers to 32. I have done this in steps, adding 2 new layers per step and training them on different datasets. This seems to have made it a quick learner, and easily fit an 8GB GPU for finetuning using Unsloth for optimizations. This model is designed to be a gateway into bigger language models.
|
32 |
|
33 |
This model is sponsored by Advanced Vintage Memeatics. A powerful dopaminergic with ties to the holy roman empire, the ghost of Richard Feynman, a radiator from the Radiator planet, and the gods defying Babel Fish. Consult your shaman before use. If their voodoo is strong you can find the even longer and even more uncut 3B model [here](https://huggingface.co/trollek/NinjaMouse-3B-40L-danube).
|
|
|
28 |
---
|
29 |
# Model Card for NinjaMouse-2.4B-32L-danube
|
30 |
|
31 |
+
#### ❗ This model gives up when the input reaches a critical mass of about tree fiddy thousand tokens
|
32 |
+
|
33 |
+
It may be an issue with the base danube model as it does the exact same thing, but [H2O.ai](https://huggingface.co/h2oai/) released another version of it. [Danube2](https://huggingface.co/h2oai/h2o-danube2-1.8b-chat) theoretically has a smaller context window while being larger in practice. I have tested it, and it works great even up to 8k. It's already training in the dojo. Stay tuned if you like more silly models like this.
|
34 |
+
|
35 |
+
---
|
36 |
+
|
37 |
A lanky version of [h2o-danube](https://huggingface.co/h2oai/h2o-danube-1.8b-chat)'s tiny language model, stretched from 24 layers to 32. I have done this in steps, adding 2 new layers per step and training them on different datasets. This seems to have made it a quick learner, and easily fit an 8GB GPU for finetuning using Unsloth for optimizations. This model is designed to be a gateway into bigger language models.
|
38 |
|
39 |
This model is sponsored by Advanced Vintage Memeatics. A powerful dopaminergic with ties to the holy roman empire, the ghost of Richard Feynman, a radiator from the Radiator planet, and the gods defying Babel Fish. Consult your shaman before use. If their voodoo is strong you can find the even longer and even more uncut 3B model [here](https://huggingface.co/trollek/NinjaMouse-3B-40L-danube).
|