File size: 2,911 Bytes
c1bd349 de77494 87bc456 7c9025d 057edcb 87bc456 7c9025d 057edcb 7c9025d de77494 7c9025d de77494 7c9025d de77494 7c9025d de77494 7c9025d de77494 7c9025d ae88d52 7c9025d de77494 7c9025d de77494 7c9025d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
license: apache-2.0
---
# Intruduction
# Eval
Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark)
| Model | Model Accuracy |
|---------------|----------------|
| mistral7b | 0.4992 |
| csmpt-130k | __0.5004__ |
| csmpt-100k | 0.4959 |
| csmpt-75k | 0.4895 |
| csmpt-50k steps | 0.4755 |
| csmpt-26.5k steps | 0.4524 |
However, we ran validation over the course of training on CS-Hellaswag, and after 100k steps, the improvements were very noisy if any.
The improvement over mistral7b is not significant.
## Loss
tbd.
## Training Method
tbd.
# Usage
## How to Setup Environment
```bash
pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0
# be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
```
## Running the Code
```python
import torch
import transformers
from transformers import pipeline
name = 'BUT-FIT/csmpt7b'
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.init_device = 'cuda:0' # For fast initialization directly on GPU!
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
torch_dtype=torch.bfloat16, # Load model weights in bfloat16
trust_remote_code=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
print(
pipe('Nejznámějším českým spisovatelem ',
max_new_tokens=100,
top_p=0.95,
repetition_penalty=1.0,
do_sample=True,
use_cache=True))
```
# Training Data
We release most of our training data here \[TBD MDocekal.\].
# Our Release Plan
| Stage | Description | Date |
|---------------|----------------|----------------|
| 1 | 'Best' model + training data | 11.03.2024
| 2 | All checkpoints + training code|
| 3 | __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation **Get in touch if you want to contribute!** |
| 4 | Preprint Publication |
## Getting in Touch
For further questions, email to `martin.fajcik@vut.cz`.
# Disclaimer
This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.
# Acknowledgement
This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT ---
"Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and
by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`). |