Text Generation
Transformers
Safetensors
Czech
mpt
custom_code
text-generation-inference
Inference Endpoints
mfajcik commited on
Commit
2f52d85
·
verified ·
1 Parent(s): f9cf8ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
4
  # Introduction
5
  CSMPT7b is a large Czech language model continously pretrained for 272b training steps from English [MPT7b](https://huggingface.co/mosaicml/mpt-7b) model. Model was pretrained on ~67b token [Large Czech Collection](https://huggingface.co/datasets/BUT-FIT/but_lcc) using Czech tokenizer, obtained using our vocabulary swap method (see below).
6
 
7
- # Eval
8
  Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark).
9
  | Model | CS-HellaSwag Accuracy |
10
  |---------------|----------------|
@@ -19,7 +19,7 @@ Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark).
19
  However, we ran validation over the course of training on CS-Hellaswag, and after 100k steps, the improvements were very noisy if any.
20
  The improvement over mistral7b is not significant.
21
 
22
- <TBD> More evaluation details teaser.
23
 
24
  ## Loss
25
  We encountered loss spikes during training. As the model always recovered, and our budget for training 7b model was very constrained, we kept on training. We observed such loss spikes before in our ablations. In these ablations (with GPT-2 small), we found these to be
 
4
  # Introduction
5
  CSMPT7b is a large Czech language model continously pretrained for 272b training steps from English [MPT7b](https://huggingface.co/mosaicml/mpt-7b) model. Model was pretrained on ~67b token [Large Czech Collection](https://huggingface.co/datasets/BUT-FIT/but_lcc) using Czech tokenizer, obtained using our vocabulary swap method (see below).
6
 
7
+ # Evaluation
8
  Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark).
9
  | Model | CS-HellaSwag Accuracy |
10
  |---------------|----------------|
 
19
  However, we ran validation over the course of training on CS-Hellaswag, and after 100k steps, the improvements were very noisy if any.
20
  The improvement over mistral7b is not significant.
21
 
22
+ We will release more evaluations together with our benchmark **BenCzechMark** soon (see release plan!).
23
 
24
  ## Loss
25
  We encountered loss spikes during training. As the model always recovered, and our budget for training 7b model was very constrained, we kept on training. We observed such loss spikes before in our ablations. In these ablations (with GPT-2 small), we found these to be