DarwinAnim8or
commited on
Commit
•
de4059b
1
Parent(s):
754751b
Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ datasets:
|
|
5 |
---
|
6 |
|
7 |
# Bamboo 400M
|
8 |
-
This is a WIP model trained only on public domain (CC0) datasets, primarily in the English language.
|
9 |
Further training is planned & ongoing, but currently no multi-language datasets are in use or planned; though this may change in the future and the current datasets *can* contain languages other than English.
|
10 |
|
11 |
## License
|
@@ -14,7 +14,7 @@ Though the training data of this model is CC0, the model itself is not. The mode
|
|
14 |
## Planned updates
|
15 |
As mentioned, a few updates are planned:
|
16 |
* Further training on more CC0 data, this model's weights will be updated as we pretrain on more of the listed datasets.
|
17 |
-
* Experiment with
|
18 |
* Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
|
19 |
* Release a GGUF version and an extended context version of the base model
|
20 |
|
@@ -27,7 +27,7 @@ This table tracks the performance of our model on various tasks over time.
|
|
27 |
| 2024-07-27 | acc | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
|
28 |
|
29 |
## Legend
|
30 |
-
- Date: The date of
|
31 |
- Metric: The evaluation metric used (acc = accuracy)
|
32 |
- Task columns: Results for each task in the format "Percentage ± Standard Error"
|
33 |
|
|
|
5 |
---
|
6 |
|
7 |
# Bamboo 400M
|
8 |
+
This is a WIP foundational (aka base) model trained only on public domain (CC0) datasets, primarily in the English language.
|
9 |
Further training is planned & ongoing, but currently no multi-language datasets are in use or planned; though this may change in the future and the current datasets *can* contain languages other than English.
|
10 |
|
11 |
## License
|
|
|
14 |
## Planned updates
|
15 |
As mentioned, a few updates are planned:
|
16 |
* Further training on more CC0 data, this model's weights will be updated as we pretrain on more of the listed datasets.
|
17 |
+
* Experiment with extending the context length using YaRN to 32k tokens.
|
18 |
* Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
|
19 |
* Release a GGUF version and an extended context version of the base model
|
20 |
|
|
|
27 |
| 2024-07-27 | acc | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
|
28 |
|
29 |
## Legend
|
30 |
+
- Date: The date of the model that the evaluation was run on. Pretraining is ongoing and tests are re-run with that date's model.
|
31 |
- Metric: The evaluation metric used (acc = accuracy)
|
32 |
- Task columns: Results for each task in the format "Percentage ± Standard Error"
|
33 |
|