zmadscientist commited on
Commit
99d616d
1 Parent(s): d0301e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -1,3 +1,20 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ QuaLA-MiniLM: a Quantized Length Adaptive
6
+ MiniLM
7
+
8
+ The article discusses the challenge of making transformer-based models efficient enough for practical use,
9
+ given their size and computational requirements. The authors propose a new approach called QuaLA-MiniLM,
10
+ which combines knowledge distillation, the length-adaptive transformer (LAT) technique,
11
+ and low-bit quantization. This approach trains a single model that can adapt to any
12
+ inference scenario with a given computational budget, achieving a superior accuracy-efficiency
13
+ trade-off on the SQuAD1.1 dataset. The authors compare this approach to other efficient methods
14
+ and find that it achieves up to an x8.8 speedup with less than 1% accuracy loss.
15
+ The authors also provide their code publicly on GitHub. The article also discusses other related work
16
+ in the field, including dynamic transformers and other knowledge distillation approaches.
17
+
18
+
19
+
20
+