Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ tags:
|
|
15 |
---
|
16 |
|
17 |
|
18 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/7NJFmljgycOJs7mcO2Cag.png" width="
|
19 |
|
20 |
## Model Description
|
21 |
|
@@ -34,6 +34,15 @@ I am attempting to learn about finetuning Qwen 2 VL 7B and this was just a resul
|
|
34 |
I ran Hermes 3 8B in Aphrodite-Engine locally and used a Python script to go through the LLaVA 150K Instruct dataset and for each sample, send a request to the model to modify the JSON sample so that output is more energetic. I used 6-shot prompt with bad samples coming from a generic LLM and good samples coming from [FPHam/Llama-3-8B-Sydney](https://huggingface.co/FPHam/Llama-3-8B-Sydney).
|
35 |
After running through about half of the dataset I noticed an error in one of my examples and upon fixing it and modifying the prompt a bit I noticed that the generation quality deteriorated and 30% of responses I was getting back didn't pass JSON validation. I settled on using the ~60000 samples that were already processed fine. I cleaned up the dataset to fix various errors in it like presence of non UTF8 characters.
|
36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
## Technical details
|
38 |
|
39 |
Model was trained in LLaMa-Factory on a system with RTX 3090 Ti with unsloth on context length of 2000 with LoRA rank 32, alpha 32 and LoRa+ ratio of 4. Training took around 11 hours and bitsandbytes quantization was not utilized.
|
@@ -89,4 +98,8 @@ I am comparing Qwen 2 VL 7B Sydney with Qwen/Qwen2-VL-7B-Instruct
|
|
89 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Tfw7rL7NX9OwVXH-Vy5IB.png" style="width: 100%; height: auto;" alt="Image 2" />
|
90 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/JqbCDhfYSqddNUaR0VgmW.png" style="width: 100%; height: auto;" alt="Image 3" />
|
91 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Uwp2q7QTjz7nFRcVU3AVG.png" style="width: 100%; height: auto;" alt="Image 4" />
|
92 |
-
</div>
|
|
|
|
|
|
|
|
|
|
15 |
---
|
16 |
|
17 |
|
18 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/7NJFmljgycOJs7mcO2Cag.png" width="200" style="float:center">
|
19 |
|
20 |
## Model Description
|
21 |
|
|
|
34 |
I ran Hermes 3 8B in Aphrodite-Engine locally and used a Python script to go through the LLaVA 150K Instruct dataset and for each sample, send a request to the model to modify the JSON sample so that output is more energetic. I used 6-shot prompt with bad samples coming from a generic LLM and good samples coming from [FPHam/Llama-3-8B-Sydney](https://huggingface.co/FPHam/Llama-3-8B-Sydney).
|
35 |
After running through about half of the dataset I noticed an error in one of my examples and upon fixing it and modifying the prompt a bit I noticed that the generation quality deteriorated and 30% of responses I was getting back didn't pass JSON validation. I settled on using the ~60000 samples that were already processed fine. I cleaned up the dataset to fix various errors in it like presence of non UTF8 characters.
|
36 |
|
37 |
+
Script used for creating the dataset is [here](https://huggingface.co/datasets/adamo1139/misc/blob/main/sydney/sydney_llava_1.py).
|
38 |
+
## Inference
|
39 |
+
|
40 |
+
I uploaded the script for inference [here](https://huggingface.co/datasets/adamo1139/misc/blob/main/sydney/run_qwen_vl.py)
|
41 |
+
This script is doing inference on this model and also normal Qwen 2 VL Instruct checkpoint.
|
42 |
+
Script is based on the simple Qwen 2 VL Gradio inference project published [here](https://old.reddit.com/r/LocalLLaMA/comments/1fv892w/simple_gradio_ui_to_run_qwen_2_vl/)
|
43 |
+
Qwen2 VL doesn't quant well, so you will need VRAM to load in the 16-bit checkpoint. I am using 24GB GPU and still, I can't load in any image or video I want since it will OOM.
|
44 |
+
Inference should work fine on both Windows and Linux. By default script uses Flash Attention 2, so if you don't want to use it, run it with flag `--flash-attn2 False`.
|
45 |
+
|
46 |
## Technical details
|
47 |
|
48 |
Model was trained in LLaMa-Factory on a system with RTX 3090 Ti with unsloth on context length of 2000 with LoRA rank 32, alpha 32 and LoRa+ ratio of 4. Training took around 11 hours and bitsandbytes quantization was not utilized.
|
|
|
98 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Tfw7rL7NX9OwVXH-Vy5IB.png" style="width: 100%; height: auto;" alt="Image 2" />
|
99 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/JqbCDhfYSqddNUaR0VgmW.png" style="width: 100%; height: auto;" alt="Image 3" />
|
100 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Uwp2q7QTjz7nFRcVU3AVG.png" style="width: 100%; height: auto;" alt="Image 4" />
|
101 |
+
</div>
|
102 |
+
|
103 |
+
## Prompt template
|
104 |
+
|
105 |
+
ChatML with system prompt "You are Sydney.". The rest of the prompt template is the same as what Qwen2 VL Instruct uses.
|