Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
|
|
4 |
# Introduction
|
5 |
CSMPT7b is a large Czech language model continously pretrained for 272b training steps from English [MPT7b](https://huggingface.co/mosaicml/mpt-7b) model. Model was pretrained on ~67b token [Large Czech Collection](https://huggingface.co/datasets/BUT-FIT/but_lcc) using Czech tokenizer, obtained using our vocabulary swap method (see below).
|
6 |
|
7 |
-
#
|
8 |
Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark).
|
9 |
| Model | CS-HellaSwag Accuracy |
|
10 |
|---------------|----------------|
|
@@ -19,7 +19,7 @@ Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark).
|
|
19 |
However, we ran validation over the course of training on CS-Hellaswag, and after 100k steps, the improvements were very noisy if any.
|
20 |
The improvement over mistral7b is not significant.
|
21 |
|
22 |
-
|
23 |
|
24 |
## Loss
|
25 |
We encountered loss spikes during training. As the model always recovered, and our budget for training 7b model was very constrained, we kept on training. We observed such loss spikes before in our ablations. In these ablations (with GPT-2 small), we found these to be
|
|
|
4 |
# Introduction
|
5 |
CSMPT7b is a large Czech language model continously pretrained for 272b training steps from English [MPT7b](https://huggingface.co/mosaicml/mpt-7b) model. Model was pretrained on ~67b token [Large Czech Collection](https://huggingface.co/datasets/BUT-FIT/but_lcc) using Czech tokenizer, obtained using our vocabulary swap method (see below).
|
6 |
|
7 |
+
# Evaluation
|
8 |
Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark).
|
9 |
| Model | CS-HellaSwag Accuracy |
|
10 |
|---------------|----------------|
|
|
|
19 |
However, we ran validation over the course of training on CS-Hellaswag, and after 100k steps, the improvements were very noisy if any.
|
20 |
The improvement over mistral7b is not significant.
|
21 |
|
22 |
+
We will release more evaluations together with our benchmark **BenCzechMark** soon (see release plan!).
|
23 |
|
24 |
## Loss
|
25 |
We encountered loss spikes during training. As the model always recovered, and our budget for training 7b model was very constrained, we kept on training. We observed such loss spikes before in our ablations. In these ablations (with GPT-2 small), we found these to be
|