nielsr HF staff commited on
Commit
4f98e91
1 Parent(s): 0857df5
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -1,6 +1,6 @@
1
  # I-BERT base model
2
 
3
- This model, `ibert-roberta-base`, is an integer-only quantized version of [RoBERTa](https://arxiv.org/abs/1907.11692), and was introduced in [this papaer](https://arxiv.org/abs/2101.01321).
4
  I-BERT stores all parameters with INT8 representation, and carries out the entire inference using integer-only arithmetic.
5
  In particular, I-BERT replaces all floating point operations in the Transformer architectures (e.g., MatMul, GELU, Softmax, and LayerNorm) with closely approximating integer operations.
6
  This can result in upto 4x inference speed up as compared to floating point counterpart when tested on an Nvidia T4 GPU.
1
  # I-BERT base model
2
 
3
+ This model, `ibert-roberta-base`, is an integer-only quantized version of [RoBERTa](https://arxiv.org/abs/1907.11692), and was introduced in [this paper](https://arxiv.org/abs/2101.01321).
4
  I-BERT stores all parameters with INT8 representation, and carries out the entire inference using integer-only arithmetic.
5
  In particular, I-BERT replaces all floating point operations in the Transformer architectures (e.g., MatMul, GELU, Softmax, and LayerNorm) with closely approximating integer operations.
6
  This can result in upto 4x inference speed up as compared to floating point counterpart when tested on an Nvidia T4 GPU.