--- license: apache-2.0 language: - en pipeline_tag: text-generation library_name: transformers tags: - moe - merge - llama-3 --- Bombus_3x8B is a Mixture of Experts (MoE) (Llama-3). ## Usage ```python !pip install -qU transformers bitsandbytes accelerate from transformers import AutoTokenizer, AutoModelForCausalLM import transformers import torch model_id = "Eurdem/Bombus_3x8B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto", load_in_4bit= True ) messages = [ {"role": "system", "content": "You are a helpful chatbot who always responds friendly."}, {"role": "user", "content": "tell me about yourself"}, ] input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda") outputs = model.generate(input_ids, max_new_tokens=1024, do_sample=True, temperature=0.7, top_p=0.7, top_k=500, ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True)) ```