File size: 2,846 Bytes
c343e79
 
 
 
 
 
 
 
 
 
ee8408d
620ea14
 
 
c343e79
620ea14
c343e79
620ea14
 
9ed9126
 
620ea14
4c24792
70fc867
 
620ea14
69fb7e4
 
620ea14
70fc867
620ea14
69fb7e4
620ea14
 
c343e79
69fb7e4
87598cf
 
469d285
87598cf
 
69fb7e4
469d285
 
 
 
 
69fb7e4
2a8999d
 
469d285
 
 
 
 
 
 
69fb7e4
 
c343e79
 
 
 
 
 
620ea14
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- mistral
- trl
- sft
base_model: unsloth/mistral-7b-v0.2-bnb-4bit
datasets:
- visheratin/realworldqa
---
# Mistral-RealworldQA-v0.2-7b SFT

<img src="https://i.imgur.com/Pf53ms5.jpeg" width="400"/>

GGUFs can be found [here](https://huggingface.co/InferenceIllusionist/Mistral-RealworldQA-v0.2-7b-SFT-GGUF)


An experiment with the goal of reducing hallucinations in [VQA](https://huggingface.co/tasks/visual-question-answering)

First in a series of experiments centering around fine-tuning for image captioning.

<h1>Release Notes</h1>

* v0.1 - Initial Release
* <b>v0.2</b> (Current)- Updating base model to official Mistral-7b fp16 release, refinements to dataset and instruction formating

<h2>Background & Methodology</h2>

Mistral-7b-02 base model was fine-tuned using the [RealWorldQA dataset](https://huggingface.co/datasets/visheratin/realworldqa), originally provided by the X.Ai Team here: https://x.ai/blog/grok-1.5v

<h1>Vision Results</h1>

Example 1
<img src="https://i.imgur.com/E9mS4Xb.jpeg" width="400"/>
Example 2
<img src="https://i.imgur.com/SmTz1Yd.jpeg" width="400"/>

* Experiment yielded model that provides shorter, less verbose output for questions about pictures
* The likelihood of hallucinations in output has decreased, however, the model can still be easily influenced to be inaccurate by the user
* Best suited for captioning use cases that require concise descriptions and low token counts
* This model lacks the conversational prose of Excalibur-7b-DPO and is much "drier" in tone

<b>Requires additional mmproj file. You have two options for vision functionality (available inside this repo):</b>
 1. [Quantized - Limited VRAM Option (197mb)](https://huggingface.co/InferenceIllusionist/Mistral-RealworldQA-v0.2-7b-SFT/resolve/main/mistral-7b-mmproj-v1.5-Q4_1.gguf?download=true)
 2. [Unquantized - Premium Option / Best Quality (596mb)](https://huggingface.co/InferenceIllusionist/Mistral-RealworldQA-v0.2-7b-SFT/resolve/main/mmproj-model-f16.gguf?download=true)

Select the gguf file of your choice in [Koboldcpp](https://github.com/LostRuins/koboldcpp/releases/) as usual, then make sure to choose the mmproj file above in the LLaVA mmproj field of the model submenu:
<img src="https://i.imgur.com/x8vqH29.png" width="425"/>

## Prompt Format
Use Alpaca for best results.


## Other info
- **Developed by:** InferenceIllusionist
- **License:** apache-2.0
- **Finetuned from model :** mistral-community/Mistral-7B-v0.2

This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)