RonanMcGovern
commited on
Commit
•
e01e96e
1
Parent(s):
379b321
add quantization benchmarking
Browse files
README.md
CHANGED
@@ -20,6 +20,25 @@ vLLM compatible model that will run in:
|
|
20 |
[One click Runpod template](https://runpod.io/console/deploy?template=rzgcdh9rqe&ref=jmfkcdio) (affiliate link).
|
21 |
Other templates available from [Trelis' one-click-llms repo](https://github.com/TrelisResearch/one-click-llms).
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
---
|
24 |
# Phi-4
|
25 |
|
|
|
20 |
[One click Runpod template](https://runpod.io/console/deploy?template=rzgcdh9rqe&ref=jmfkcdio) (affiliate link).
|
21 |
Other templates available from [Trelis' one-click-llms repo](https://github.com/TrelisResearch/one-click-llms).
|
22 |
|
23 |
+
## Quantization Benchmarking
|
24 |
+
Run using llm_eval on 100 rows of gsm8k.
|
25 |
+
|
26 |
+
Base model - 16 bit:
|
27 |
+
```
|
28 |
+
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|
29 |
+
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|
30 |
+
|gsm8k| 3|flexible-extract| 5|exact_match|↑ | 0.93|± |0.0256|
|
31 |
+
| | |strict-match | 5|exact_match|↑ | 0.93|± |0.0256|
|
32 |
+
```
|
33 |
+
|
34 |
+
fp8 model:
|
35 |
+
```
|
36 |
+
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|
37 |
+
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|
38 |
+
|gsm8k| 3|flexible-extract| 5|exact_match|↑ | 0.93|± |0.0256|
|
39 |
+
| | |strict-match | 5|exact_match|↑ | 0.93|± |0.0256|
|
40 |
+
```
|
41 |
+
|
42 |
---
|
43 |
# Phi-4
|
44 |
|