eryk-mazus
commited on
Commit
•
2b84ea6
1
Parent(s):
bd8ead3
Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,8 @@ widget:
|
|
23 |
|
24 |
The training took 425 GPU hours on a single 8 x RTX 4090 machine with DeepSpeed ZeRO-2.
|
25 |
|
|
|
|
|
26 |
## Notes
|
27 |
|
28 |
This base model was initially developed as a foundation for **instruction tuning, which is currently underway**. Nevertheless, I'm sharing it with the community now, because I recognize the potential value in its blend of relatively strong performance and an efficient bilingual tokenizer.
|
|
|
23 |
|
24 |
The training took 425 GPU hours on a single 8 x RTX 4090 machine with DeepSpeed ZeRO-2.
|
25 |
|
26 |
+
Context size: 2,048 tokens.
|
27 |
+
|
28 |
## Notes
|
29 |
|
30 |
This base model was initially developed as a foundation for **instruction tuning, which is currently underway**. Nevertheless, I'm sharing it with the community now, because I recognize the potential value in its blend of relatively strong performance and an efficient bilingual tokenizer.
|