BUT-FIT
/

csmpt7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

csmpt7b / README.md

mfajcik's picture

Update README.md

057edcb verified 8 months ago

|

2.91 kB

	---
	license: apache-2.0
	---
	# Intruduction


	# Eval
	Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark)
	\| Model \| Model Accuracy \|
	\|---------------\|----------------\|
	\| mistral7b \| 0.4992 \|
	\| csmpt-130k \| __0.5004__ \|
	\| csmpt-100k \| 0.4959 \|
	\| csmpt-75k \| 0.4895 \|
	\| csmpt-50k steps \| 0.4755 \|
	\| csmpt-26.5k steps \| 0.4524 \|


	However, we ran validation over the course of training on CS-Hellaswag, and after 100k steps, the improvements were very noisy if any.
	The improvement over mistral7b is not significant.

	## Loss
	tbd.


	## Training Method
	tbd.


	# Usage
	## How to Setup Environment
	```bash
	pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0

	# be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
	pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
	1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
	```

	## Running the Code
	```python
	import torch
	import transformers
	from transformers import pipeline

	name = 'BUT-FIT/csmpt7b'

	config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
	config.init_device = 'cuda:0' # For fast initialization directly on GPU!
	model = transformers.AutoModelForCausalLM.from_pretrained(
	name,
	config=config,
	torch_dtype=torch.bfloat16, # Load model weights in bfloat16
	trust_remote_code=True
	)

	tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True)

	pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')

	with torch.autocast('cuda', dtype=torch.bfloat16):
	print(
	pipe('Nejznámějším českým spisovatelem ',
	max_new_tokens=100,
	top_p=0.95,
	repetition_penalty=1.0,
	do_sample=True,
	use_cache=True))

	```
	# Training Data
	We release most of our training data here \[TBD MDocekal.\].


	# Our Release Plan
	\| Stage \| Description \| Date \|
	\|---------------\|----------------\|----------------\|
	\| 1 \| 'Best' model + training data \| 11.03.2024
	\| 2 \| All checkpoints + training code\|
	\| 3 \| __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation Get in touch if you want to contribute! \|
	\| 4 \| Preprint Publication \|

	## Getting in Touch
	For further questions, email to `martin.fajcik@vut.cz`.

	# Disclaimer
	This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.


	# Acknowledgement
	This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT ---
	"Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and
	by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`).