--- library_name: transformers tags: - deutsch - german - seedbox - mistral - mixtral - multilingual license: apache-2.0 language: - de - en pipeline_tag: text-generation --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/645ded34a45b4182d7f5c385/9QywLGTbRrHYSq-m6fQmJ.jpeg) # KafkaLM-8x7b-German-V0.1 **KafkaLM 8x7b** is a MoE model based on [Mistral AI´s Mixtral 8x7b](https://mistral.ai/news/mixtral-of-experts/) which was finetuned on an ensemble of popular high-quality open-source instruction sets (translated from English to German). KafkaLM 8x7b is a [Seedbox](https://huggingface.co/seedboxai) project trained by [Dennis Dickmann](https://huggingface.co/doubledsbv). **Why Kafka?** The models are proficient, yet creative, have some tendencies to linguistically push boundaries 😊 ## Model Details The purpose of releasing the **KafkaLM series** is to contribute to the German AI community with a set of fine-tuned LLMs that are easy to use in everyday applications across a variety of tasks. The main goal was to provide LLMs proficient in German, especially to be used in German-speaking business contexts where English alone is not sufficient. ### DPO The model has been aligned with a german and modified version of the ultra feedback dataset from huggingface. ### Dataset I used a 8k filtered version of the following [seedboxai/multitask_german_examples_32k](https://huggingface.co/datasets/seedboxai/multitask_german_examples_32k) ### Inference Getting started with the model is straightforward ```python import transformers import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "seedboxai/KafkaLM-Mixtral-8x7B-V0.2" model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16) tokenizer = transformers.AutoTokenizer.from_pretrained(model_id) pipeline = transformers.pipeline( model=model, tokenizer=tokenizer, return_full_text=True, task='text-generation', device="cuda", ) messages = [ {"role": "system", "content": "Du bist ein hilfreicher KI-Assistent."}, {"role": "user", "content": "Wer ist eigentlich dieser Kafka?"}, ] prompt = pipeline.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) terminators = [ pipeline.tokenizer.eos_token_id, pipeline.tokenizer.convert_tokens_to_ids("") ] outputs = pipeline( prompt, max_new_tokens=max_new_tokens, eos_token_id=terminators, do_sample=True, temperature=0.7, top_p=0.9, ) print(outputs[0]["generated_text"][len(prompt):]) ``` ## Disclaimer The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model.