skatzR commited on
Commit
4c89cf5
Β·
verified Β·
1 Parent(s): ccc647f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -32,6 +32,7 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
32
  | **Supported HW** | CPU (optimized for Intel AVX512-VNNI, fallback to AVX2) |
33
  | **License** | Apache-2.0 |
34
 
 
35
 
36
  ## πŸš€ Features
37
 
@@ -40,6 +41,8 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
40
  - πŸ”„ **Drop-in replacement** β€” embeddings compatible with the FP32 version.
41
  - 🌍 **Multilingual** β€” supports Russian πŸ‡·πŸ‡Ί and English πŸ‡¬πŸ‡§.
42
 
 
 
43
  ## 🧠 Intended Use
44
 
45
  **βœ… Recommended for:**
@@ -50,7 +53,9 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
50
 
51
  **❌ Not ideal for:**
52
  - Absolute maximum accuracy scenarios (INT8 introduces minor loss)
53
- - GPU-optimized pipelines (prefer FP16/FP32 models instead)
 
 
54
 
55
  ## βš–οΈ Pros & Cons of Quantized ONNX
56
 
@@ -64,6 +69,8 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
64
  - AVX512 optimizations only on modern Intel CPUs.
65
  - No GPU acceleration in this export.
66
 
 
 
67
  ## πŸ“Š Benchmark
68
 
69
  | Metric | Value |
@@ -75,6 +82,7 @@ It is designed for **fast CPU inference** with [ONNX Runtime](https://onnxruntim
75
  | Inference speed | ~2Γ— faster |
76
  | Model size (MB) | 347.5 |
77
 
 
78
 
79
  ## πŸ“‚ Files
80
 
@@ -84,6 +92,7 @@ tokenizer.json, vocab.txt, special_tokens_map.json β€” tokenizer
84
 
85
  config.json β€” model config
86
 
 
87
 
88
  ## 🧩 Examples
89
 
 
32
  | **Supported HW** | CPU (optimized for Intel AVX512-VNNI, fallback to AVX2) |
33
  | **License** | Apache-2.0 |
34
 
35
+ ---
36
 
37
  ## πŸš€ Features
38
 
 
41
  - πŸ”„ **Drop-in replacement** β€” embeddings compatible with the FP32 version.
42
  - 🌍 **Multilingual** β€” supports Russian πŸ‡·πŸ‡Ί and English πŸ‡¬πŸ‡§.
43
 
44
+ ---
45
+
46
  ## 🧠 Intended Use
47
 
48
  **βœ… Recommended for:**
 
53
 
54
  **❌ Not ideal for:**
55
  - Absolute maximum accuracy scenarios (INT8 introduces minor loss)
56
+ - GPU-optimized pipelines (prefer FP16/FP32 models instead)
57
+
58
+ ---
59
 
60
  ## βš–οΈ Pros & Cons of Quantized ONNX
61
 
 
69
  - AVX512 optimizations only on modern Intel CPUs.
70
  - No GPU acceleration in this export.
71
 
72
+ ---
73
+
74
  ## πŸ“Š Benchmark
75
 
76
  | Metric | Value |
 
82
  | Inference speed | ~2Γ— faster |
83
  | Model size (MB) | 347.5 |
84
 
85
+ ---
86
 
87
  ## πŸ“‚ Files
88
 
 
92
 
93
  config.json β€” model config
94
 
95
+ ---
96
 
97
  ## 🧩 Examples
98