svilupp commited on
Commit
43cf2e2
1 Parent(s): c8ae487

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -3
README.md CHANGED
@@ -1,3 +1,64 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ ---
7
+
8
+ # TinyBERT_L-4_H-312_v2 ONNX Model
9
+
10
+ This repository provides an ONNX version of the `TinyBERT_L-4_H-312_v2` model, originally developed by the team at [Huawei Noah's Ark Lab](https://arxiv.org/abs/1909.10351)
11
+ and ported to Transformers by [Nils Reimers](https://huggingface.co/nreimers).
12
+ The model is a compact version of BERT, designed for efficient inference and reduced memory footprint. The ONNX version includes mean pooling of the last hidden layer for convenient feature extraction.
13
+
14
+ ## Model Overview
15
+
16
+ TinyBERT is a smaller version of BERT that maintains competitive performance while significantly reducing the number of parameters and computational cost. This makes it ideal for deployment in resource-constrained environments. The model is based on the work presented in the paper ["TinyBERT: Distilling BERT for Natural Language Understanding"](https://arxiv.org/abs/1909.10351).
17
+
18
+ ## License
19
+
20
+ This model is distributed under the Apache 2.0 License. For more details, please refer to the [license file](https://github.com/huawei-noah/Pretrained-Language-Model/blob/master/TinyBERT/LICENSE) in the original repository.
21
+
22
+ ## Model Details
23
+
24
+ - **Model:** TinyBERT_L-4_H-312_v2
25
+ - **Layers:** 4
26
+ - **Hidden Size:** 312
27
+ - **Pooling:** Mean pooling of the last hidden layer
28
+ - **Format:** ONNX
29
+
30
+ ## Usage
31
+
32
+ To use this model, you will need to have `onnxruntime` installed. You can install it via pip:
33
+
34
+ ```bash
35
+ pip install onnxruntime, transformers
36
+ ```
37
+
38
+ Below is a Python code snippet demonstrating how to run inference using this ONNX model:
39
+
40
+ ```python
41
+ import onnxruntime as ort
42
+ from transformers import AutoTokenizer
43
+
44
+ model_path="TinyBERT_L-4_H-312_v2-onnx/"
45
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
46
+ ort_sess = ort.InferenceSession(model_path + "/tinybert_mean_embeddings.onnx")
47
+
48
+ features = tokenizer(['How many people live in Berlin?','Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="np")
49
+ onnx_inputs = {k: v for k, v in features.items() if k != 'token_type_ids'}
50
+ ort_outs = ort_sess.run(None, onnx_inputs)
51
+ print(ort_outs)
52
+
53
+ print("Mean pooled output:", mean_pooled_output)
54
+ ```
55
+
56
+ Make sure to replace `'model_path'` with the actual path to your ONNX model file.
57
+
58
+ ## Training Details
59
+
60
+ For detailed information on the training process of TinyBERT, please refer to the [original paper](https://arxiv.org/abs/1909.10351) by Huawei Noah's Ark Lab.
61
+
62
+ ## Acknowledgements
63
+
64
+ This model is based on the work by the team at Huawei Noah's Ark Lab and by Nils Reimers. Special thanks to the developers for providing the pre-trained model and making it accessible to the community.