--- license: apache-2.0 datasets: - IlyaGusev/gazeta - IlyaGusev/ru_turbo_alpaca_evol_instruct - IlyaGusev/ru_turbo_alpaca - IlyaGusev/ru_turbo_saiga - RussianNLP/russian_super_glue language: - ru pipeline_tag: question-answering --- The model was trained on part of the datasets *IlyaGusev/gazeta* , *IlyaGusev/ru_turbo_alpaca_evol_instruct*, *IlyaGusev/ru_turbo_alpaca*, *IlyaGusev/ru_turbo_saiga* , *RussianNLP/russian_super_glue (muserc)* using LoRA #### Base_model NousResearch/Yarn-Llama-2-7b-64k #### Need cuda > 11.4 ### GPU A100 ```python !pip install peft !pip install flash-attn --no-build-isolation !pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary ``` ```python model = AutoModelForCausalLM.from_pretrained( 'geldarr/saiga-Yarn-Llama-2-7b-64k', trust_remote_code=True, torch_dtype=torch.float16, device_map={'':0} ) tokenizer = AutoTokenizer.from_pretrained('geldarr/saiga-Yarn-Llama-2-7b-64k', use_fast=False) ``` ```python big_prompts = '''system\nТы — Сайга, русскоязычный автоматический ассистент. Ты разговариваешь с людьми и помогаешь им.\n user Дай ответы на вопрос основываясь только на тексте ниже:\n вопрос? Текст <65536 tokens bot ''' ```python gen_config = { "pad_token_id": 0, "bos_token_id": 1, "eos_token_id": 2, "temperature": 0.4, "top_p": 0.9, "top_k": 50, "do_sample": True, "max_new_tokens": 15360, "repetition_penalty": 1.1, "no_repeat_ngram_size": 15, } generation_config = GenerationConfig.from_dict(gen_config) ``` ```python def generate(model, tokenizer, prompt, generation_config): data = tokenizer(prompt, return_tensors="pt") data = {k: v.to(model.device) for k, v in data.items()} output_ids = model.generate( **data, generation_config=generation_config )[0] output_ids = output_ids[len(data["input_ids"][0]):] output = tokenizer.decode(output_ids) return output.strip() output = generate(model, tokenizer, big_prompts, generation_config) print(output) ```