File size: 6,167 Bytes
2854d1d
 
 
305c966
c888f6b
 
 
 
 
 
 
 
305c966
c888f6b
305c966
c888f6b
 
 
305c966
c888f6b
305c966
c888f6b
305c966
 
 
 
 
 
 
c888f6b
 
 
 
 
 
 
 
305c966
c888f6b
 
 
 
 
305c966
c888f6b
305c966
 
 
 
 
 
 
 
 
 
 
c888f6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
afa6765
 
 
 
c888f6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d60af43
 
 
 
 
 
afa6765
 
 
 
 
d60af43
c888f6b
305c966
c888f6b
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
license: openrail
---
**This model is a merge between 66% of Wizard Coder and 33% of Redmond Hermes Coder (which is Wizard Coder fine-tune):**

https://huggingface.co/NousResearch/Redmond-Hermes-Coder
https://huggingface.co/WizardLM/WizardCoder-15B-V1.0

Merger done by the most basic value average. 

Using CTranslate2 for quantization and inference achieving as much as 37 tokens /s on RTX 3090 GPU.

Inference is done by using text-generation-webui:

Added this code and ran an update on requirements.txt: https://github.com/oobabooga/text-generation-webui/pull/2828 

There is one thing extra to be changed in the code: reply = apply_extensions('output', reply) to: reply = apply_extensions('output', reply, state)

The idea was to get some of the coding abilities back that were lost in fine-tune but retain at least basic capabilities to summarize text and work with context. This experiment was also focused on using CT2 for its speed. 

**I believe the presented approach is the best available compromise between speed, coding accuracy, and a little of general LLM use.**

**Please note that CT2 8bit quant seems to have better HumanEval scores than load-in-8bit**


The community now mostly focuses on making non-coding models - code as making coding models be more general seems near impossible.
However, my daily use is focused on DevOps questions, summarizing content, and script development. Further development will be around intent analysis for integration with TODO lists and calendar extracting actions and notes from my voice transcription. This model doesn't seem to work well enough on those tasks so next time will attempt actual fine-tunes of Wizard Coder or just run two models at the same time. I hope to fit under 24GB VRAM which would mean I will also evaluate 4 bit quantization.

My initial testing was checking if the model finds:

Overflow: `"what is mistake in following C++ code: int a = 1e9+7; int b = 1e9+9; int c = a*b; cout << c;"`

Out of bounds: `"what is bug in the following C++ code: int a = 100; vector <int> b(a); b[a] = 20; cout << b[a] << '\n';"`

and propose using "docker update" for `"how to stop docker container so it doesnt start every reboot"`



I have run those prompts in the loop, with different presets and ended up picking this preset:
`['temperature'] = 1.31`
`['top_p'] = 0.29`
`['top_k'] = 72`
`['repetition_penalty'] = 1.09`

        
Testing of the above prompts has shown that Hermes Coder CT2 was not able to answer correctly most of the time while Wizard Coder and this merge did. The merged model seems to retain the ability to use "### Input:" in the prompt and became more sensitive to non-coding instruction. (Wizard Coder almost completely disregards it)

In the bottom you can see EvalPlus benchmarks of three mentioned models - seems they all performed in a similar way with the default preset. I'm not sure if I'm not doing the benchmark right or if those quants are not working properly with default preset. As I noticed custom preset considerably improved the result. 

**I would greatly appreciate if anyone can confirm how good this model is with proposed preset as the result I got really positively suprised me.(seems better than any other Wizard Coder 8bit quant**

**CT2 int8_float16 merge, custom preset:**
`Base`
`{'pass@1': 0.47560975609756095}`
`Base + Extra`
`{'pass@1': 0.45121951219512196}`


**For summarization I propose following prompt:**

`Below is an instruction that describes a task. Write a response that appropriately completes the request.`

`### Instruction:`
`Please provide a concise, summary for each topic presented in the input below. Ensure clarity, coherence, and avoid redundant information.`

`### Input:`
`[CONTENT TO SUMMARIZE]`

`### Response:The summary for each topic presented in the input is as follows:`

**Optionally iterate over the output with following prompt:**

`Below is an instruction that describes a task. Write a response that appropriately completes the request.`

`### Instruction:`
`Rewrite summary from Input. Fix typos, add missing spaces. Ensure clarity, coherence, and remove redundant information.`

`### Input:`
`[OUTPUT FROM PREVIOUS PROMPT]`

`### Response:`

**HumanEval** run using: https://github.com/my-other-github-account/llm-humaneval-benchmarks/
and
`sudo docker run -v $(pwd):/app ganler/evalplus:latest --dataset humaneval --samples results/{model_name}.jsonl`


**Custom preset:**
`['temperature'] = 1.31`
`['top_p'] = 0.29`
`['top_k'] = 72`
`['repetition_penalty'] = 1.09`

**CT2 int8_float16 merge, custom preset:**
`Base`
`{'pass@1': 0.47560975609756095}`
`Base + Extra`
`{'pass@1': 0.45121951219512196}`
**one of the worse reruns:**
{'pass@1': 0.4573170731707317}
Base + Extra
{'pass@1': 0.4146341463414634}

**CT2 int8_float16 Wizard Coder:**
`Base`
`{'pass@1': 0.43902439024390244}`
`Base + Extra`
`{'pass@1': 0.3597560975609756}`
**Retry:**
`Base`
`{'pass@1': 0.42073170731707316}`
`Base + Extra`
`{'pass@1': 0.3475609756097561}`

**Full-weight Wizard Coder loaded with --load-in-8bit, custom preset:**
`Base`
`{'pass@1': 0.3475609756097561}`
`Base + Extra`
`{'pass@1': 0.3170731707317073}`

---

**Default llm-humaneval-benchmarks preset:**
`['temperature'] = 1`
`['top_p'] = 1`
`['top_k'] = 0`
`['repetition_penalty'] = 1`

**CT2 int8_float16 - this model:**
`Base`
`{'pass@1': 0.4634146341463415}`
`Base + Extra`
`{'pass@1': 0.4024390243902439}`

**CT2 int8_float16 Redmond Hermes Coder:**
`Base`
`{'pass@1': 0.4695121951219512}`
`Base + Extra`
`{'pass@1': 0.4146341463414634}`

**CT2 int8_float16 Wizard Coder:**
`Base`
`{'pass@1': 0.4695121951219512}`
`Base + Extra`
`{'pass@1': 0.3902439024390244}`

**Full-weight Wizard Coder loaded with --load-in-8bit, default preset:**
`Base`
`{'pass@1': 0.43902439024390244}`
`Base + Extra`
`{'pass@1': 0.3719512195121951}`

**Full-weight merged model loaded with --load-in-8bit, default preset:**
Base
{'pass@1': 0.43902439024390244}
Base + Extra
{'pass@1': 0.3902439024390244}

**Full-weight Hermes Coder model loaded with --load-in-8bit, default preset:**
Base
{'pass@1': 0.4451219512195122}
Base + Extra
{'pass@1': 0.4146341463414634}

--------------