Quantization made by Richard Erkhov.

llama-600M-rus - GGUF

Model creator: https://huggingface.co/demetera/
Original model: https://huggingface.co/demetera/llama-600M-rus/

Name	Quant method	Size
llama-600M-rus.Q2_K.gguf	Q2_K	0.22GB
llama-600M-rus.IQ3_XS.gguf	IQ3_XS	0.24GB
llama-600M-rus.IQ3_S.gguf	IQ3_S	0.25GB
llama-600M-rus.Q3_K_S.gguf	Q3_K_S	0.25GB
llama-600M-rus.IQ3_M.gguf	IQ3_M	0.26GB
llama-600M-rus.Q3_K.gguf	Q3_K	0.27GB
llama-600M-rus.Q3_K_M.gguf	Q3_K_M	0.27GB
llama-600M-rus.Q3_K_L.gguf	Q3_K_L	0.28GB
llama-600M-rus.IQ4_XS.gguf	IQ4_XS	0.29GB
llama-600M-rus.Q4_0.gguf	Q4_0	0.3GB
llama-600M-rus.IQ4_NL.gguf	IQ4_NL	0.3GB
llama-600M-rus.Q4_K_S.gguf	Q4_K_S	0.31GB
llama-600M-rus.Q4_K.gguf	Q4_K	0.33GB
llama-600M-rus.Q4_K_M.gguf	Q4_K_M	0.33GB
llama-600M-rus.Q4_1.gguf	Q4_1	0.33GB
llama-600M-rus.Q5_0.gguf	Q5_0	0.36GB
llama-600M-rus.Q5_K_S.gguf	Q5_K_S	0.36GB
llama-600M-rus.Q5_K.gguf	Q5_K	0.38GB
llama-600M-rus.Q5_K_M.gguf	Q5_K_M	0.38GB
llama-600M-rus.Q5_1.gguf	Q5_1	0.39GB
llama-600M-rus.Q6_K.gguf	Q6_K	0.45GB
llama-600M-rus.Q8_0.gguf	Q8_0	0.54GB

Original model description:

license: mit language: - ru library_name: transformers

llama-600M-rus

Simple and customized amateur experimental model pretrained on the text fiction books from the scratch (updating the model regularly).
It could generate amateur, but more or less adequate output as well (in respect of training tokens).
The work can be used as a checkpoint for the further training or for experiments.

Simple usage example:

from transformers import LlamaTokenizerFast, LlamaForCausalLM
model = LlamaForCausalLM.from_pretrained('demetera/llama-600M-rus')
tokenizer = LlamaTokenizerFast.from_pretrained('demetera/llama-600M-rus')

prompt = "Я вышел и улицу и"
inputs = tokenizer(prompt, return_tensors='pt')
outputs = model.generate(inputs.input_ids, attention_mask = inputs.attention_mask, max_new_tokens=250, do_sample=True, top_k=50, top_p=0.95)

print (tokenizer.decode(outputs[0], skip_special_tokens=True))