--- tags: - moe - frankenmoe - merge - mergekit - fhnw/Llama-3-8B-pineapple-pizza-orpo - fhnw/Llama-3-8B-pineapple-recipe-sft base_model: - fhnw/Llama-3-8B-pineapple-pizza-orpo - fhnw/Llama-3-8B-pineapple-recipe-sft --- # Llama-3-pineapple-2x8B Llama-3-pineapple-2x8B is a Mixture of Experts (MoE) made with the following models: * [fhnw/Llama-3-8B-pineapple-pizza-orpo](https://huggingface.co/fhnw/Llama-3-8B-pineapple-pizza-orpo) * [fhnw/Llama-3-8B-pineapple-recipe-sft](https://huggingface.co/fhnw/Llama-3-8B-pineapple-recipe-sft) ## Configuration ```yaml base_model: fhnw/Llama-3-8B-pineapple-pizza-orpo experts: - source_model: fhnw/Llama-3-8B-pineapple-pizza-orpo positive_prompts: ["assistant", "chat"] - source_model: fhnw/Llama-3-8B-pineapple-recipe-sft positive_prompts: ["recipe"] gate_mode: hidden dtype: float16 ``` ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "fhnw/Llama-3-pineapple-2x8B" device = torch.device("cuda" if torch.cuda.is_available() else "cpu") tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to(device) messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Is pineapple on a pizza a crime?"} ] input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(device) terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = model.generate( input_ids, max_new_tokens=256, eos_token_id=terminators, do_sample=True, temperature=0.7, top_p=0.9, ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True)) ```