Text Generation
Transformers
Safetensors
Czech
mpt
custom_code
text-generation-inference
Inference Endpoints
mfajcik commited on
Commit
5c1b8e7
1 Parent(s): 983aff7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -72,7 +72,7 @@ Figure 4: Test perplexity over the course of training for vocabulary swap (swapp
72
 
73
  We also verify that finetuning from English to Czech is beneficial for MPT-7B model, compared from training a new model, at least on the first 10K steps. The training also seems to be more stable (notice yellow spike around 10k steps).
74
  <img src="figures/csmpt_tllama_test.png" width="900"/>
75
- Figure 5: Test cross-entropy over the course of training on CSMPT7B (yellow-red). Comparison with TinyLLAMA (blue-green). Our method (red&green curve) vs TinyLLAMA training from scratch (yellow&blue curve).
76
 
77
  The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
78
  For CSMPT7b, we managed to align 4,177 english tokens with corresponding czech tokens.
 
72
 
73
  We also verify that finetuning from English to Czech is beneficial for MPT-7B model, compared from training a new model, at least on the first 10K steps. The training also seems to be more stable (notice yellow spike around 10k steps).
74
  <img src="figures/csmpt_tllama_test.png" width="900"/>
75
+ Figure 5: Test cross-entropy over the course of training on CSMPT7B (yellow-red). Comparison with TinyLLAMA (blue-green). Our method (red&green curve) vs training from scratch (yellow&blue curve).
76
 
77
  The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
78
  For CSMPT7b, we managed to align 4,177 english tokens with corresponding czech tokens.