ChrisGoringe
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -42,12 +42,7 @@ On an A40 (plenty of VRAM), everything except the model identical, the time take
|
|
42 |
- 9_2 => 42.8s
|
43 |
- 9_6 => 48.2s
|
44 |
|
45 |
-
for comparison
|
46 |
-
- bfloat16 (default) =>
|
47 |
-
- fp8_e4m3fn =>
|
48 |
-
- fp8_e5m2 =>
|
49 |
-
|
50 |
-
|
51 |
|
52 |
## How is this optimised?
|
53 |
|
|
|
42 |
- 9_2 => 42.8s
|
43 |
- 9_6 => 48.2s
|
44 |
|
45 |
+
for comparison, the unquantised models take about 27s.
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
## How is this optimised?
|
48 |
|