Create README.md
Browse filesadded link to upstream format and shoutouts
README.md
ADDED
@@ -0,0 +1,128 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: mistralai/Mistral-7B-v0.1
|
3 |
+
tags:
|
4 |
+
- mistral
|
5 |
+
- instruct
|
6 |
+
- finetune
|
7 |
+
- chatml
|
8 |
+
- gpt4
|
9 |
+
- synthetic data
|
10 |
+
- distillation
|
11 |
+
- multimodal
|
12 |
+
- llava
|
13 |
+
model-index:
|
14 |
+
- name: Nous-Hermes-2-Vision
|
15 |
+
results: []
|
16 |
+
license: apache-2.0
|
17 |
+
language:
|
18 |
+
- en
|
19 |
+
---
|
20 |
+
|
21 |
+
GGUF Quants by Twobob, Thanks to @jartine and @cmp-nct for the assists
|
22 |
+
It's vicuna
|
23 |
+
|
24 |
+
ref: [here](https://github.com/qnguyen3/hermes-llava/blob/173b4ef441b5371c1e7d99da7a2e7c14c77ad12f/llava/conversation.py#L252)
|
25 |
+
|
26 |
+
# Nous-Hermes-2-Vision - Mistral 7B
|
27 |
+
|
28 |
+
|
29 |
+
![image/png](https://camo.githubusercontent.com/b09dc35a93b4b70748fa4e2f307b011cd3d548369dd926ec9a2d3a51f7b3721e/68747470733a2f2f66696c65732e6f616975736572636f6e74656e742e636f6d2f66696c652d6b4437565358734f5649576472624b3042353662686644363f73653d323032332d31322d3033543137253341333425334135385a2673703d722673763d323032312d30382d30362673723d6226727363633d6d61782d6167652533443331353336303030253243253230696d6d757461626c6526727363643d6174746163686d656e7425334225323066696c656e616d6525334439643530333039622d356236342d343964302d623832362d6165316638366132396661382e77656270267369673d50396973694b4679654a54435a47424b526d45494b3043586e6e55676c6334704a583071312532425478666a34253344)
|
30 |
+
|
31 |
+
*In the tapestry of Greek mythology, Hermes reigns as the eloquent Messenger of the Gods, a deity who deftly bridges the realms through the art of communication. It is in homage to this divine mediator that I name this advanced LLM "Hermes," a system crafted to navigate the complex intricacies of human discourse with celestial finesse.*
|
32 |
+
|
33 |
+
## Model description
|
34 |
+
|
35 |
+
Nous-Hermes-2-Vision stands as a pioneering Vision-Language Model, leveraging advancements from the renowned **OpenHermes-2.5-Mistral-7B** by teknium. This model incorporates two pivotal enhancements, setting it apart as a cutting-edge solution:
|
36 |
+
|
37 |
+
- **SigLIP-400M Integration**: Diverging from traditional approaches that rely on substantial 3B vision encoders, Nous-Hermes-2-Vision harnesses the formidable SigLIP-400M. This strategic choice not only streamlines the model's architecture, making it more lightweight, but also capitalizes on SigLIP's remarkable capabilities. The result? A remarkable boost in performance that defies conventional expectations.
|
38 |
+
|
39 |
+
- **Custom Dataset Enriched with Function Calling**: Our model's training data includes a unique feature – function calling. This distinctive addition transforms Nous-Hermes-2-Vision into a **Vision-Language Action Model**. Developers now have a versatile tool at their disposal, primed for crafting a myriad of ingenious automations.
|
40 |
+
|
41 |
+
This project is led by [qnguyen3](https://twitter.com/stablequan) and [teknium](https://twitter.com/Teknium1).
|
42 |
+
## Training
|
43 |
+
### Dataset
|
44 |
+
- 220K from **LVIS-INSTRUCT4V**
|
45 |
+
- 60K from **ShareGPT4V**
|
46 |
+
- 150K Private **Function Calling Data**
|
47 |
+
- 50K conversations from teknium's **OpenHermes-2.5**
|
48 |
+
|
49 |
+
## Usage
|
50 |
+
### Prompt Format
|
51 |
+
- Like other LLaVA's variants, this model uses Vicuna-V1 as its prompt template. Please refer to `conv_llava_v1` in [this file](https://github.com/qnguyen3/hermes-llava/blob/main/llava/conversation.py)
|
52 |
+
- For Gradio UI, please visit this [GitHub Repo](https://github.com/qnguyen3/hermes-llava)
|
53 |
+
|
54 |
+
### Function Calling
|
55 |
+
- For functiong calling, the message should start with a `<fn_call>` tag. Here is an example:
|
56 |
+
|
57 |
+
```json
|
58 |
+
<fn_call>{
|
59 |
+
"type": "object",
|
60 |
+
"properties": {
|
61 |
+
"bus_colors": {
|
62 |
+
"type": "array",
|
63 |
+
"description": "The colors of the bus in the image.",
|
64 |
+
"items": {
|
65 |
+
"type": "string",
|
66 |
+
"enum": ["red", "blue", "green", "white"]
|
67 |
+
}
|
68 |
+
},
|
69 |
+
"bus_features": {
|
70 |
+
"type": "string",
|
71 |
+
"description": "The features seen on the back of the bus."
|
72 |
+
},
|
73 |
+
"bus_location": {
|
74 |
+
"type": "string",
|
75 |
+
"description": "The location of the bus (driving or pulled off to the side).",
|
76 |
+
"enum": ["driving", "pulled off to the side"]
|
77 |
+
}
|
78 |
+
}
|
79 |
+
}
|
80 |
+
```
|
81 |
+
|
82 |
+
Output:
|
83 |
+
```json
|
84 |
+
{
|
85 |
+
"bus_colors": ["red", "white"],
|
86 |
+
"bus_features": "An advertisement",
|
87 |
+
"bus_location": "driving"
|
88 |
+
}
|
89 |
+
```
|
90 |
+
|
91 |
+
## Example
|
92 |
+
|
93 |
+
### Chat
|
94 |
+
![image/png](https://i.ibb.co/tMg8h2t/Screenshot-from-2023-12-04-00-13-59.png)
|
95 |
+
|
96 |
+
### Function Calling
|
97 |
+
Input image:
|
98 |
+
|
99 |
+
![image/png](https://www.slcmenu.com/wp-content/uploads/2017/11/In-N-Out-Burger-menu-2020-982x1024.jpg)
|
100 |
+
|
101 |
+
Input message:
|
102 |
+
```json
|
103 |
+
<fn_call>{
|
104 |
+
"type": "object",
|
105 |
+
"properties": {
|
106 |
+
"food_list": {
|
107 |
+
"type": "array",
|
108 |
+
"description": "List of all the food",
|
109 |
+
"items": {
|
110 |
+
"type": "string",
|
111 |
+
}
|
112 |
+
},
|
113 |
+
}
|
114 |
+
}
|
115 |
+
```
|
116 |
+
|
117 |
+
Output:
|
118 |
+
```json
|
119 |
+
{
|
120 |
+
"food_list": [
|
121 |
+
"Double Burger",
|
122 |
+
"Cheeseburger",
|
123 |
+
"French Fries",
|
124 |
+
"Shakes",
|
125 |
+
"Coffee"
|
126 |
+
]
|
127 |
+
}
|
128 |
+
```
|