TinyBERT_L-4_H-312_v2 ONNX Model

This repository provides an ONNX version of the TinyBERT_L-4_H-312_v2 model, originally developed by the team at Huawei Noah's Ark Lab and ported to Transformers by Nils Reimers. The model is a compact version of BERT, designed for efficient inference and reduced memory footprint. The ONNX version includes mean pooling of the last hidden layer for convenient feature extraction.

Model Overview

TinyBERT is a smaller version of BERT that maintains competitive performance while significantly reducing the number of parameters and computational cost. This makes it ideal for deployment in resource-constrained environments. The model is based on the work presented in the paper "TinyBERT: Distilling BERT for Natural Language Understanding".

License

This model is distributed under the Apache 2.0 License. For more details, please refer to the license file in the original repository.

Model Details

Model: TinyBERT_L-4_H-312_v2
Layers: 4
Hidden Size: 312
Pooling: Mean pooling of the last hidden layer
Format: ONNX

Usage

To use this model, you will need to have onnxruntime installed. You can install it via pip:

pip install onnxruntime, transformers

Below is a Python code snippet demonstrating how to run inference using this ONNX model:

import onnxruntime as ort
from transformers import AutoTokenizer

model_path="TinyBERT_L-4_H-312_v2-onnx/"
tokenizer = AutoTokenizer.from_pretrained(model_path)
ort_sess = ort.InferenceSession(model_path + "/tinybert_mean_embeddings.onnx")

features = tokenizer(['How many people live in Berlin?','Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="np")
onnx_inputs = {k: v for k, v in features.items() if k != 'token_type_ids'}
ort_outs = ort_sess.run(None, onnx_inputs)
print(ort_outs)

print("Mean pooled output:", mean_pooled_output)

Make sure to replace 'model_path' with the actual path to your ONNX model file.

Training Details

For detailed information on the training process of TinyBERT, please refer to the original paper by Huawei Noah's Ark Lab.

Acknowledgements

This model is based on the work by the team at Huawei Noah's Ark Lab and by Nils Reimers. Special thanks to the developers for providing the pre-trained model and making it accessible to the community.