Text Generation
Transformers
GGUF
English
code
Eval Results
Inference Endpoints
maddes8cht commited on
Commit
3980f2f
1 Parent(s): 47a7fa7

"Update README.md"

Browse files
Files changed (1) hide show
  1. README.md +780 -0
README.md ADDED
@@ -0,0 +1,780 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: true
4
+ widget:
5
+ - text: 'def print_hello_world():'
6
+ example_title: Hello world
7
+ group: Python
8
+ license: bigscience-openrail-m
9
+ pretrain-datasets:
10
+ - books
11
+ - arxiv
12
+ - c4
13
+ - falcon-refinedweb
14
+ - wiki
15
+ - github-issues
16
+ - stack_markdown
17
+ - self-made dataset of permissive github code
18
+ datasets:
19
+ - bigcode/the-stack-dedup
20
+ - rombodawg/2XUNCENSORED_MegaCodeTraining188k
21
+ - bigcode/commitpackft
22
+ metrics:
23
+ - code_eval
24
+ library_name: transformers
25
+ tags:
26
+ - code
27
+ model-index:
28
+ - name: Refact-1.6B
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ dataset:
33
+ type: openai_humaneval
34
+ name: HumanEval
35
+ metrics:
36
+ - name: pass@1 (T=0.01)
37
+ type: pass@1
38
+ value: 32.0
39
+ verified: false
40
+ - name: pass@1 (T=0.2)
41
+ type: pass@1
42
+ value: 31.5
43
+ verified: false
44
+ - name: pass@10 (T=0.8)
45
+ type: pass@10
46
+ value: 53.0
47
+ verified: false
48
+ - name: pass@100 (T=0.8)
49
+ type: pass@100
50
+ value: 76.9
51
+ verified: false
52
+ - task:
53
+ type: text-generation
54
+ dataset:
55
+ type: bigcode/humanevalpack
56
+ name: HumanEvalSynthesize Python
57
+ metrics:
58
+ - name: pass@1 (T=0.2)
59
+ type: pass@1
60
+ value: 35.8
61
+ verified: false
62
+ - task:
63
+ type: text-generation
64
+ dataset:
65
+ type: bigcode/humanevalpack
66
+ name: HumanEvalSynthesize JavaScript
67
+ metrics:
68
+ - name: pass@1 (T=0.2)
69
+ type: pass@1
70
+ value: 31.6
71
+ verified: false
72
+ - task:
73
+ type: text-generation
74
+ dataset:
75
+ type: bigcode/humanevalpack
76
+ name: HumanEvalSynthesize Java
77
+ metrics:
78
+ - name: pass@1 (T=0.2)
79
+ type: pass@1
80
+ value: 29.1
81
+ verified: false
82
+ - task:
83
+ type: text-generation
84
+ dataset:
85
+ type: bigcode/humanevalpack
86
+ name: HumanEvalSynthesize Go
87
+ metrics:
88
+ - name: pass@1 (T=0.2)
89
+ type: pass@1
90
+ value: -1
91
+ verified: false
92
+ - task:
93
+ type: text-generation
94
+ dataset:
95
+ type: bigcode/humanevalpack
96
+ name: HumanEvalSynthesize C++
97
+ metrics:
98
+ - name: pass@1 (T=0.2)
99
+ type: pass@1
100
+ value: 26.3
101
+ verified: false
102
+ - task:
103
+ type: text-generation
104
+ dataset:
105
+ type: bigcode/humanevalpack
106
+ name: HumanEvalSynthesize Rust
107
+ metrics:
108
+ - name: pass@1 (T=0.2)
109
+ type: pass@1
110
+ value: -1
111
+ verified: false
112
+ - task:
113
+ type: text-generation
114
+ dataset:
115
+ type: bigcode/humanevalpack
116
+ name: HumanEvalSynthesize Average
117
+ metrics:
118
+ - name: pass@1 (T=0.2)
119
+ type: pass@1
120
+ value: -1
121
+ verified: false
122
+
123
+
124
+
125
+
126
+
127
+ - task:
128
+ type: text-generation
129
+ dataset:
130
+ type: bigcode/humanevalpack
131
+ name: HumanEvalFixTests Python
132
+ metrics:
133
+ - name: pass@1 (T=0.2)
134
+ type: pass@1
135
+ value: 18.38
136
+ verified: false
137
+ - task:
138
+ type: text-generation
139
+ dataset:
140
+ type: bigcode/humanevalpack
141
+ name: HumanEvalFixTests JavaScript
142
+ metrics:
143
+ - name: pass@1 (T=0.2)
144
+ type: pass@1
145
+ value: 12.28
146
+ verified: false
147
+ - task:
148
+ type: text-generation
149
+ dataset:
150
+ type: bigcode/humanevalpack
151
+ name: HumanEvalFixTests Java
152
+ metrics:
153
+ - name: pass@1 (T=0.2)
154
+ type: pass@1
155
+ value: 15.12
156
+ verified: false
157
+ - task:
158
+ type: text-generation
159
+ dataset:
160
+ type: bigcode/humanevalpack
161
+ name: HumanEvalFixTests Go
162
+ metrics:
163
+ - name: pass@1 (T=0.2)
164
+ type: pass@1
165
+ value: -1
166
+ verified: false
167
+ - task:
168
+ type: text-generation
169
+ dataset:
170
+ type: bigcode/humanevalpack
171
+ name: HumanEvalFixTests C++
172
+ metrics:
173
+ - name: pass@1 (T=0.2)
174
+ type: pass@1
175
+ value: 13.17
176
+ verified: false
177
+ - task:
178
+ type: text-generation
179
+ dataset:
180
+ type: bigcode/humanevalpack
181
+ name: HumanEvalFixTests Rust
182
+ metrics:
183
+ - name: pass@1 (T=0.2)
184
+ type: pass@1
185
+ value: 2.8
186
+ verified: false
187
+ - task:
188
+ type: text-generation
189
+ dataset:
190
+ type: bigcode/humanevalpack
191
+ name: HumanEvalFixTests Average
192
+ metrics:
193
+ - name: pass@1 (T=0.2)
194
+ type: pass@1
195
+ value: -1
196
+ verified: false
197
+
198
+
199
+
200
+
201
+
202
+
203
+ - task:
204
+ type: text-generation
205
+ dataset:
206
+ type: bigcode/humanevalpack
207
+ name: HumanEvalFixDocs Python
208
+ metrics:
209
+ - name: pass@1 (T=0.2)
210
+ type: pass@1
211
+ value: 26.92
212
+ verified: false
213
+ - task:
214
+ type: text-generation
215
+ dataset:
216
+ type: bigcode/humanevalpack
217
+ name: HumanEvalFixDocs JavaScript
218
+ metrics:
219
+ - name: pass@1 (T=0.2)
220
+ type: pass@1
221
+ value: 26.85
222
+ verified: false
223
+ - task:
224
+ type: text-generation
225
+ dataset:
226
+ type: bigcode/humanevalpack
227
+ name: HumanEvalFixDocs Java
228
+ metrics:
229
+ - name: pass@1 (T=0.2)
230
+ type: pass@1
231
+ value: 30.76
232
+ verified: false
233
+ - task:
234
+ type: text-generation
235
+ dataset:
236
+ type: bigcode/humanevalpack
237
+ name: HumanEvalFixDocs Go
238
+ metrics:
239
+ - name: pass@1 (T=0.2)
240
+ type: pass@1
241
+ value: -1
242
+ verified: false
243
+ - task:
244
+ type: text-generation
245
+ dataset:
246
+ type: bigcode/humanevalpack
247
+ name: HumanEvalFixDocs C++
248
+ metrics:
249
+ - name: pass@1 (T=0.2)
250
+ type: pass@1
251
+ value: 25.94
252
+ verified: false
253
+ - task:
254
+ type: text-generation
255
+ dataset:
256
+ type: bigcode/humanevalpack
257
+ name: HumanEvalFixDocs Rust
258
+ metrics:
259
+ - name: pass@1 (T=0.2)
260
+ type: pass@1
261
+ value: 8.44
262
+ verified: false
263
+ - task:
264
+ type: text-generation
265
+ dataset:
266
+ type: bigcode/humanevalpack
267
+ name: HumanEvalFixDocs Average
268
+ metrics:
269
+ - name: pass@1 (T=0.2)
270
+ type: pass@1
271
+ value: -1
272
+ verified: false
273
+
274
+
275
+
276
+
277
+ - task:
278
+ type: text-generation
279
+ dataset:
280
+ type: bigcode/humanevalpack
281
+ name: HumanEvalExplain Python
282
+ metrics:
283
+ - name: pass@1 (T=0.2)
284
+ type: pass@1
285
+ value: 26.46
286
+ verified: false
287
+ - task:
288
+ type: text-generation
289
+ dataset:
290
+ type: bigcode/humanevalpack
291
+ name: HumanEvalExplain JavaScript
292
+ metrics:
293
+ - name: pass@1 (T=0.2)
294
+ type: pass@1
295
+ value: 17.86
296
+ verified: false
297
+ - task:
298
+ type: text-generation
299
+ dataset:
300
+ type: bigcode/humanevalpack
301
+ name: HumanEvalExplain Java
302
+ metrics:
303
+ - name: pass@1 (T=0.2)
304
+ type: pass@1
305
+ value: 20.94
306
+ verified: false
307
+ - task:
308
+ type: text-generation
309
+ dataset:
310
+ type: bigcode/humanevalpack
311
+ name: HumanEvalExplain Go
312
+ metrics:
313
+ - name: pass@1 (T=0.2)
314
+ type: pass@1
315
+ value: -1
316
+ verified: false
317
+ - task:
318
+ type: text-generation
319
+ dataset:
320
+ type: bigcode/humanevalpack
321
+ name: HumanEvalExplain C++
322
+ metrics:
323
+ - name: pass@1 (T=0.2)
324
+ type: pass@1
325
+ value: 18.78
326
+ verified: false
327
+ - task:
328
+ type: text-generation
329
+ dataset:
330
+ type: bigcode/humanevalpack
331
+ name: HumanEvalExplain Rust
332
+ metrics:
333
+ - name: pass@1 (T=0.2)
334
+ type: pass@1
335
+ value: -1
336
+ verified: false
337
+ - task:
338
+ type: text-generation
339
+ dataset:
340
+ type: bigcode/humanevalpack
341
+ name: HumanEvalExplain Average
342
+ metrics:
343
+ - name: pass@1 (T=0.2)
344
+ type: pass@1
345
+ value: -1
346
+ verified: false
347
+
348
+
349
+ - task:
350
+ type: text-generation
351
+ dataset:
352
+ type: mbpp
353
+ name: MBPP
354
+ metrics:
355
+ - name: pass@1 (T=0.01)
356
+ type: pass@1
357
+ value: 31.15
358
+ verified: false
359
+ - task:
360
+ type: text-generation
361
+ dataset:
362
+ type: ds1000
363
+ name: DS-1000 (Overall Completion)
364
+ metrics:
365
+ - name: pass@1 (T=0.2)
366
+ type: pass@1
367
+ value: 10.1
368
+ verified: false
369
+ - task:
370
+ type: text-generation
371
+ dataset:
372
+ type: nuprl/MultiPL-E
373
+ name: MultiPL-HumanEval (C++)
374
+ metrics:
375
+ - name: pass@1 (T=0.2)
376
+ type: pass@1
377
+ value: 21.61
378
+ verified: false
379
+ - task:
380
+ type: text-generation
381
+ dataset:
382
+ type: nuprl/MultiPL-E
383
+ name: MultiPL-HumanEval (C#)
384
+ metrics:
385
+ - name: pass@1 (T=0.2)
386
+ type: pass@1
387
+ value: 13.91
388
+ verified: false
389
+ - task:
390
+ type: text-generation
391
+ dataset:
392
+ type: nuprl/MultiPL-E
393
+ name: MultiPL-HumanEval (D)
394
+ metrics:
395
+ - name: pass@1 (T=0.2)
396
+ type: pass@1
397
+ value: 9.5
398
+ verified: false
399
+ - task:
400
+ type: text-generation
401
+ dataset:
402
+ type: nuprl/MultiPL-E
403
+ name: MultiPL-HumanEval (Go)
404
+ metrics:
405
+ - name: pass@1 (T=0.2)
406
+ type: pass@1
407
+ value: 53.57
408
+ verified: false
409
+ - task:
410
+ type: text-generation
411
+ dataset:
412
+ type: nuprl/MultiPL-E
413
+ name: MultiPL-HumanEval (Java)
414
+ metrics:
415
+ - name: pass@1 (T=0.2)
416
+ type: pass@1
417
+ value: 21.58
418
+ verified: false
419
+ - task:
420
+ type: text-generation
421
+ dataset:
422
+ type: nuprl/MultiPL-E
423
+ name: MultiPL-HumanEval (Julia)
424
+ metrics:
425
+ - name: pass@1 (T=0.2)
426
+ type: pass@1
427
+ value: 13.75
428
+ verified: false
429
+ - task:
430
+ type: text-generation
431
+ dataset:
432
+ type: nuprl/MultiPL-E
433
+ name: MultiPL-HumanEval (JavaScript)
434
+ metrics:
435
+ - name: pass@1 (T=0.2)
436
+ type: pass@1
437
+ value: 26.88
438
+ verified: false
439
+ - task:
440
+ type: text-generation
441
+ dataset:
442
+ type: nuprl/MultiPL-E
443
+ name: MultiPL-HumanEval (Lua)
444
+ metrics:
445
+ - name: pass@1 (T=0.2)
446
+ type: pass@1
447
+ value: 15.26
448
+ verified: false
449
+ - task:
450
+ type: text-generation
451
+ dataset:
452
+ type: nuprl/MultiPL-E
453
+ name: MultiPL-HumanEval (PHP)
454
+ metrics:
455
+ - name: pass@1 (T=0.2)
456
+ type: pass@1
457
+ value: 23.04
458
+ verified: false
459
+ - task:
460
+ type: text-generation
461
+ dataset:
462
+ type: nuprl/MultiPL-E
463
+ name: MultiPL-HumanEval (Perl)
464
+ metrics:
465
+ - name: pass@1 (T=0.2)
466
+ type: pass@1
467
+ value: 12.1
468
+ verified: false
469
+ - task:
470
+ type: text-generation
471
+ dataset:
472
+ type: nuprl/MultiPL-E
473
+ name: MultiPL-HumanEval (Python)
474
+ metrics:
475
+ - name: pass@1 (T=0.2)
476
+ type: pass@1
477
+ value: 29.6
478
+ verified: false
479
+ - task:
480
+ type: text-generation
481
+ dataset:
482
+ type: nuprl/MultiPL-E
483
+ name: MultiPL-HumanEval (R)
484
+ metrics:
485
+ - name: pass@1 (T=0.2)
486
+ type: pass@1
487
+ value: 13.77
488
+ verified: false
489
+ - task:
490
+ type: text-generation
491
+ dataset:
492
+ type: nuprl/MultiPL-E
493
+ name: MultiPL-HumanEval (Ruby)
494
+ metrics:
495
+ - name: pass@1 (T=0.2)
496
+ type: pass@1
497
+ value: 12.68
498
+ verified: false
499
+ - task:
500
+ type: text-generation
501
+ dataset:
502
+ type: nuprl/MultiPL-E
503
+ name: MultiPL-HumanEval (Racket)
504
+ metrics:
505
+ - name: pass@1 (T=0.2)
506
+ type: pass@1
507
+ value: 4.29
508
+ verified: false
509
+ - task:
510
+ type: text-generation
511
+ dataset:
512
+ type: nuprl/MultiPL-E
513
+ name: MultiPL-HumanEval (Rust)
514
+ metrics:
515
+ - name: pass@1 (T=0.2)
516
+ type: pass@1
517
+ value: 19.54
518
+ verified: false
519
+ - task:
520
+ type: text-generation
521
+ dataset:
522
+ type: nuprl/MultiPL-E
523
+ name: MultiPL-HumanEval (Scala)
524
+ metrics:
525
+ - name: pass@1 (T=0.2)
526
+ type: pass@1
527
+ value: 18.33
528
+ verified: false
529
+ - task:
530
+ type: text-generation
531
+ dataset:
532
+ type: nuprl/MultiPL-E
533
+ name: MultiPL-HumanEval (Bash)
534
+ metrics:
535
+ - name: pass@1 (T=0.2)
536
+ type: pass@1
537
+ value: 5.7
538
+ verified: false
539
+ - task:
540
+ type: text-generation
541
+ dataset:
542
+ type: nuprl/MultiPL-E
543
+ name: MultiPL-HumanEval (Swift)
544
+ metrics:
545
+ - name: pass@1 (T=0.2)
546
+ type: pass@1
547
+ value: 17.68
548
+ verified: false
549
+ - task:
550
+ type: text-generation
551
+ dataset:
552
+ type: nuprl/MultiPL-E
553
+ name: MultiPL-HumanEval (TypeScript)
554
+ metrics:
555
+ - name: pass@1 (T=0.2)
556
+ type: pass@1
557
+ value: 25
558
+ verified: false
559
+
560
+ language:
561
+ - en
562
+ ---
563
+ [![banner](https://maddes8cht.github.io/assets/buttons/Huggingface-banner.jpg)]()
564
+
565
+ I'm constantly enhancing these model descriptions to provide you with the most relevant and comprehensive information
566
+
567
+ # Refact-1_6B-fim - GGUF
568
+ - Model creator: [smallcloudai](https://huggingface.co/smallcloudai)
569
+ - Original model: [Refact-1_6B-fim](https://huggingface.co/smallcloudai/Refact-1_6B-fim)
570
+
571
+ Refact seems to be an original model so far without any descendants.
572
+ It was [anounced](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) on the refact.ai website and published on Huggingface.
573
+
574
+
575
+
576
+ # About GGUF format
577
+
578
+ `gguf` is the current file format used by the [`ggml`](https://github.com/ggerganov/ggml) library.
579
+ A growing list of Software is using it and can therefore use this model.
580
+ The core project making use of the ggml library is the [llama.cpp](https://github.com/ggerganov/llama.cpp) project by Georgi Gerganov
581
+
582
+ # Quantization variants
583
+
584
+ There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:
585
+
586
+ # Legacy quants
587
+
588
+ Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
589
+ Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
590
+ ## Note:
591
+ Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not *real* K-quants. More details can be found in affected model descriptions.
592
+ (This mainly refers to Falcon 7b and Starcoder models)
593
+
594
+ # K-quants
595
+
596
+ K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load.
597
+ So, if possible, use K-quants.
598
+ With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.
599
+
600
+
601
+
602
+
603
+ ---
604
+
605
+ # Original Model Card:
606
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643a9dd0c5f633a7fa7e804a/HkB0QYV0BbmB3ktMugbZy.png)
607
+
608
+
609
+ # Refact-1.6B
610
+
611
+ Finally, the model we started training with our [blog post](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) is ready 🎉
612
+
613
+ After fine-tuning on generated data, it beats Replit 3b, Stability Code 3b and many other models. It almost beats
614
+ StarCoder ten times the size!
615
+
616
+
617
+ Model | Size | HumanEval pass@1 | HumanEval pass@10 |
618
+ ----------------------|---------------|--------------------|--------------------|
619
+ DeciCoder-1b | 1b | 19.1% | |
620
+ <b>Refact-1.6-fim</b> | <b>1.6b</b> | <b>32.0%</b> | <b>53.0%</b> |
621
+ StableCode | 3b | 20.2% | 33.8% |
622
+ ReplitCode v1 | 3b | 21.9% | |
623
+ CodeGen2.5-multi | 7b | 28.4% | 47.5% |
624
+ CodeLlama | 7b | 33.5% | 59.6% |
625
+ StarCoder | 15b | 33.6% | |
626
+
627
+ Likely, it's the best model for practical use in your IDE for code completion because it's smart and fast!
628
+ You can start using it right now by downloading the
629
+ [Refact plugin](https://refact.ai/). You can host the model yourself, too, using the
630
+ [open source docker container](https://github.com/smallcloudai/refact).
631
+
632
+ And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
633
+
634
+ # It Works As a Chat
635
+
636
+ The primary application of this model is code completion (infill) in multiple programming languages.
637
+ But it works as a chat quite well.
638
+
639
+ HumanEval results using instruction following (chat) format, against models specialized for chat only:
640
+
641
+ Model | Size | pass@1 | pass@10 |
642
+ -----------------------|--------|----------|----------|
643
+ <b>Refact-1.6-fim</b> | 1.6b | 38.4% | 55.6% |
644
+ StableCode-instruct | 3b | 26.9% | 36.2% |
645
+ OctoGeeX | 6b | 44.7% | |
646
+ CodeLlama-instruct | 7b | 34.8% | 64.3% |
647
+ CodeGen2.5-instruct | 7b | 36.2% | 60.87 |
648
+ CodeLlama-instruct | 13b | 42.7% | 71.6% |
649
+ StarChat-β | 15b | 33.5% | |
650
+ OctoCoder | 15b | 46.2% | |
651
+
652
+
653
+ # Example
654
+
655
+ Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
656
+
657
+ ```python
658
+ # pip install -q transformers
659
+ from transformers import AutoModelForCausalLM, AutoTokenizer
660
+
661
+ checkpoint = "smallcloudai/Refact-1_6B-fim"
662
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
663
+
664
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
665
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
666
+
667
+ prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
668
+
669
+ inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
670
+ outputs = model.generate(inputs, max_length=100, temperature=0.2)
671
+ print("-"*80)
672
+ print(tokenizer.decode(outputs[0]))
673
+ ```
674
+
675
+ # Chat Format
676
+
677
+ The same model works as chat (experimental).
678
+
679
+ ```python
680
+ prompt_template = "<empty_output>SYSTEM {system}\n" \
681
+ "<empty_output>USER {query}\n" \
682
+ "<empty_output>ASSISTANT"
683
+ prompt = prompt_template.format(system="You are a programming assistant",
684
+ query="How do I sort a list in Python?")
685
+ ```
686
+
687
+ # Architecture
688
+
689
+ As described in more detail in the blog post, we used:
690
+
691
+ - [ALiBi](https://arxiv.org/abs/2108.12409) based attention
692
+ - [LayerNorm](https://arxiv.org/abs/1607.06450v1) instead of [RMSNorm](https://arxiv.org/pdf/1910.07467.pdf)
693
+ - [Multi Query Attention](https://arxiv.org/abs/1911.02150)
694
+
695
+ We also used LiON, flash attention, early dropout. It's not that innovative that you can't run it, in fact you can -- see an example below.
696
+
697
+
698
+ # Pretraining
699
+
700
+ For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets.
701
+ Filtering is the key to success of this model:
702
+
703
+ - We only used text in English
704
+ - Only topics related to computer science
705
+ - Applied heavy deduplication
706
+
707
+ The text to code proportion was 50:50, model trained for 1.2T tokens.
708
+
709
+ We don't release the base model, because its Fill-in-the-Middle (FIM) capability likes to repeat itself too much, so
710
+ its practical use is limited. But if you still want it, write us a message on Discord.
711
+
712
+
713
+ # Finetuning
714
+
715
+ We tested our hypothesis that chat data should boost base model performance in FIM and
716
+ regular left-to-right code completion. We found that just 15% of open
717
+ [code](https://huggingface.co/datasets/bigcode/commitpackft)
718
+ [instruction-following](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k) datasets,
719
+ that we filtered for quality, improves almost all metrics.
720
+
721
+ Additionally, to improve FIM, we observed common failure modes, and prepared a synthetic dataset based on
722
+ [The Stack dedup v1.1](https://huggingface.co/datasets/bigcode/the-stack-dedup) to address them.
723
+
724
+ There is a distribution shift between typical code on the internet, and the code you write in your IDE.
725
+ The former is likely finished, so the model tries to come up with a suggestion that makes the code complete.
726
+ You are likely to have half-written code as you work on it, there is no single addition that can repair it
727
+ fully.
728
+
729
+ In practice, model needs to have a tendency to stop after a couple of lines are added, and sometimes don't write
730
+ anything at all. We found that just giving it empty completions, single line completions, multiline
731
+ completions that end with a smaller text indent or at least a newline -- makes it much more usable. This data
732
+ was used as the rest 85% of the finetune dataset.
733
+
734
+ The final model is the result of several attempts to make it work as good as possible for code completion,
735
+ and to perform well on a wide range of metrics. The best attempt took 40B tokens.
736
+
737
+
738
+ # Limitations and Bias
739
+
740
+ The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
741
+ code comments. Its performance on non-English languages is lower, for sure.
742
+
743
+
744
+ # Model Stats
745
+
746
+ - **Architecture:** LLAMA-like model with multi-query attention
747
+ - **Objectives** Fill-in-the-Middle, Chat
748
+ - **Tokens context:** 4096
749
+ - **Pretraining tokens:** 1.2T
750
+ - **Finetuning tokens:** 40B
751
+ - **Precision:** bfloat16
752
+ - **GPUs** 64 NVidia A5000
753
+ - **Training time** 28 days
754
+
755
+
756
+ # License
757
+
758
+ The model is licensed under the BigScience OpenRAIL-M v1 license agreement
759
+
760
+
761
+ # Citation
762
+
763
+ If you are using this model, please give a link to this page.
764
+
765
+ ***End of original Model File***
766
+ ---
767
+
768
+
769
+ ## Please consider to support my work
770
+ **Coming Soon:** I'm in the process of launching a sponsorship/crowdfunding campaign for my work. I'm evaluating Kickstarter, Patreon, or the new GitHub Sponsors platform, and I am hoping for some support and contribution to the continued availability of these kind of models. Your support will enable me to provide even more valuable resources and maintain the models you rely on. Your patience and ongoing support are greatly appreciated as I work to make this page an even more valuable resource for the community.
771
+
772
+ <center>
773
+
774
+ [![GitHub](https://maddes8cht.github.io/assets/buttons/github-io-button.png)](https://maddes8cht.github.io)
775
+ [![Stack Exchange](https://stackexchange.com/users/flair/26485911.png)](https://stackexchange.com/users/26485911)
776
+ [![GitHub](https://maddes8cht.github.io/assets/buttons/github-button.png)](https://github.com/maddes8cht)
777
+ [![HuggingFace](https://maddes8cht.github.io/assets/buttons/huggingface-button.png)](https://huggingface.co/maddes8cht)
778
+ [![Twitter](https://maddes8cht.github.io/assets/buttons/twitter-button.png)](https://twitter.com/maddes1966)
779
+
780
+ </center>