--- base_model: LeoLM/leo-hessianai-7b license: cc-by-4.0 datasets: - caretech-owl/wikiquote-de-quotes language: - de library_name: adapter-transformers pipeline_tag: text-generation --- # Model Card for Model ID This model is trained to generate german quotes for a given author. The full model can be tested at [spaces/caretech-owl/quote-generator-de](https://huggingface.co/spaces/caretech-owl/quote-generator-de), here we provide the LORA adapter files for loading on top of the base model [LeoLM/leo-hessianai-7b](https://huggingface.co/LeoLM/leo-hessianai-7b). ## Model Details ### Model Description This fine-tuned model has been trained on the [caretech-owl/wikiquote-de-quotes](https://huggingface.co/datasets/caretech-owl/wikiquote-de-quotes) dataset. The model was trained on a prompt like this ```python prompt_format = "<|im_start|>system\ Dies ist eine Unterhaltung zwischen einem\ intelligenten, hilfsbereitem KI-Assistenten und einem Nutzer. Der Assistent gibt Antworten in Form von Zitaten.<|im_end|>\n\ <|im_start|>user\ Zitiere {author}<|im_end|>\n<\ |im_start|>assistant\n{quote}<|im_end|>\n" ``` Where author is itended to be provided by the user, the quote is of format ```quote + " - " + author```. While the model is not able to provide "real" quotes, using authors that are part of the training set and a low temperature for generation results in somewhat realistic quotes that at least sound familiar. - **Developed by:** [CareTech OWL](https://www.caretech-owl.de/) - **Model type:** LLAMA2 LORA adapter - **Language(s) (NLP):** German - **License:** [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) - **Finetuned from model:** [LeoLM/leo-hessianai-7b](https://huggingface.co/LeoLM/leo-hessianai-7b) ## Uses ```python from transformers import ( AutoModelForCausalLM, AutoTokenizer, pipeline ) base_model = AutoModelForCausalLM.from_pretrained('LeoLM/leo-hessianai-7b') tokenizer = AutoTokenizer.from_pretrained('LeoLM/leo-hessianai-7b', trust_remote_code=False) tokenizer.pad_token = tokenizer.eos_token base_model.load_adapter('caretech-owl/leo-hessianai-7B-ggpq-german-quotes-lora', adapter_name='leo-hessianai-7B-ggpq-german-quotes-lora') base_model.enable_adapters() text_gen = pipeline(task="text-generation", model=base_model, max_length=200, tokenizer=tokenizer) system_prompt = """Dies ist eine Unterhaltung zwischen \ einem intelligenten, hilfsbereitem \ KI-Assistenten und einem Nutzer. Der Assistent gibt Antworten in Form von Zitaten.""" prompt_format = "<|im_start|>system\n{system_prompt}\ <|im_end|>\n<|im_start|>user\nZitiere {prompt}\ <|im_end|>\n<|im_start|>assistant\n" def get_quote(author:str, max_length:int=200): query = prompt_format.format(system_prompt=system_prompt, prompt= author) output = text_gen(query, do_sample=True, top_p=0.95, max_length=max_length, return_full_text=False, pad_token_id=tokenizer.pad_token_id) print(output[0]['generated_text']) get_quote("Heinrich Heine") ``` ## Training procedure The following `bitsandbytes` quantization config was used during training: - quant_method: gptq - bits: 8 - tokenizer: None - dataset: None - group_size: 32 - damp_percent: 0.1 - desc_act: True - sym: True - true_sequential: True - use_cuda_fp16: False - model_seqlen: None - block_name_to_quantize: None - module_name_preceding_first_block: None - batch_size: 1 - pad_token_id: None - use_exllama: True - max_input_length: None - exllama_config: {'version': } - cache_block_outputs: True ### Framework versions - PEFT 0.6.2