--- base_model: - NousResearch/Hermes-2-Pro-Llama-3-8B - xtuner/llava-llama-3-8b-v1_1-transformers tags: - llama - instruct - finetune - chatml - DPO - RLHF - gpt4 - synthetic data - distillation - function calling - json mode - llava - vision - multimodal model-index: - name: Hermes-2-Pro-Llama-3-8B results: [] license: apache-2.0 language: - en datasets: - teknium/OpenHermes-2.5 - Lin-Chen/ShareGPT4V widget: - example_title: Hermes 2 Pro messages: - role: system content: You are a sentient, superintelligent artificial general intelligence, here to teach and assist me. - role: user content: Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world. --- # Nous Hermes 2 Pro + Xtuner Llava v1.1 - Llama 3 8B Nous Hermes 2 Pro's LLaMA weights + Xtuner Llava's mm_projector & vision_tower weights. Good QA + Function Calling + JSON Mode + Vision Multimodal GGUFs: - Nous Hermes 2 pro: https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF - Xtuner LLaVA v1.1: https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf Test code: ```python import requests from PIL import Image import torch from transformers import AutoProcessor, LlavaForConditionalGeneration model_id = "vonjack/Nous-Hermes-2-Pro-Xtuner-LLaVA-v1_1-Llama-3-8B" prompt = ("<|im_start|>user\n\nWhat are these?<|im_end|>" "<|im_start|>assistant\n") image_file = "http://images.cocodataset.org/val2017/000000039769.jpg" model = LlavaForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.float16, low_cpu_mem_usage=True, ).to(0) processor = AutoProcessor.from_pretrained(model_id) raw_image = Image.open(requests.get(image_file, stream=True).raw) inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16) output = model.generate(**inputs, max_new_tokens=200, do_sample=False) print(processor.decode(output[0][2:], skip_special_tokens=True)) ``` Example: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6396e4f81dade26da03cdb73/y34Jlh4S72SCEki9v0uPM.png)