Text Generation
Transformers
Safetensors
Czech
mpt
custom_code
text-generation-inference
Inference Endpoints
File size: 2,948 Bytes
c1bd349
 
 
7c9025d
87bc456
7c9025d
 
 
 
 
 
 
 
 
 
87bc456
 
7c9025d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: apache-2.0
---
### Eval
Dev eval at CS-HellaSwag  (automatically translated HellaSwag benchmark)
| Model | Model Accuracy |
|---------------|----------------|
| mistral7b       | 0.4992         |
| csmpt-130k    | __0.5004__         |
| csmpt-100k    | 0.4959         |
| csmpt-75k     | 0.4895         |
| csmpt-50k steps | 0.4755       |
| csmpt-26.5k steps | 0.4524      |


However, we ran validation over the course of training on CS-Hellaswag, and after 100k, the improvements were very noisy if any.
The improvement over mistral7b is not significant.


### How to setup environment
```bash
pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0

# be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl

### How to use in transformers
```python
import torch
import transformers
from transformers import pipeline

name = 'BUT-FIT/csmpt7b'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'flash'
config.init_device = 'cuda:0'  # For fast initialization directly on GPU!
model = transformers.AutoModelForCausalLM.from_pretrained(
    name,
    config=config,
    torch_dtype=torch.bfloat16,  # Load model weights in bfloat16
    trust_remote_code=True
)

tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True)

pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')

with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe('Nejznámějším českým spisovatelem ',
             max_new_tokens=100,
             top_p=0.95,
             repetition_penalty=1.0,
             do_sample=True,
             use_cache=True))

``` 


### Our Release Plan
| Stage | Description | Date |
|---------------|----------------|----------------|
| 1       | 'Best' model + training data    | 11.03.2024
| 2       |  All checkpoints + training code| 
| 3       | __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation    | 



- Stage 1: 'Best' model + training data.
- Stage 2: All checkpoints + training code
- Stage 3: __Benczechmark__ a collection of Czech datasets. **Get in touch if you'd like to know more and contribute!**

## Getting in Touch
For further questions, email to `martin.fajcik@vut.cz`.

## Disclaimer
This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk. 


## Acknowledgement
This work was supported by NAKI III program of  Ministry of Culture Czech Republic, project semANT --- 
"Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and
by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`).