--- license: apache-2.0 --- # Intruduction # Eval Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark) | Model | Model Accuracy | |---------------|----------------| | mistral7b | 0.4992 | | csmpt-130k | __0.5004__ | | csmpt-100k | 0.4959 | | csmpt-75k | 0.4895 | | csmpt-50k steps | 0.4755 | | csmpt-26.5k steps | 0.4524 | However, we ran validation over the course of training on CS-Hellaswag, and after 100k steps, the improvements were very noisy if any. The improvement over mistral7b is not significant. ## Loss tbd. ## Training Method tbd. # Usage ## How to Setup Environment ```bash pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0 # be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2. 1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl ``` ## Running the Code ```python import torch import transformers from transformers import pipeline name = 'BUT-FIT/csmpt7b' config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True) config.init_device = 'cuda:0' # For fast initialization directly on GPU! model = transformers.AutoModelForCausalLM.from_pretrained( name, config=config, torch_dtype=torch.bfloat16, # Load model weights in bfloat16 trust_remote_code=True ) tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True) pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0') with torch.autocast('cuda', dtype=torch.bfloat16): print( pipe('Nejznámějším českým spisovatelem ', max_new_tokens=100, top_p=0.95, repetition_penalty=1.0, do_sample=True, use_cache=True)) ``` # Training Data We release most of our training data here \[TBD MDocekal.\]. # Our Release Plan | Stage | Description | Date | |---------------|----------------|----------------| | 1 | 'Best' model + training data | 11.03.2024 | 2 | All checkpoints + training code| | 3 | __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation **Get in touch if you want to contribute!** | | 4 | Preprint Publication | ## Getting in Touch For further questions, email to `martin.fajcik@vut.cz`. # Disclaimer This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk. # Acknowledgement This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT --- "Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`).