|
--- |
|
license: mit |
|
--- |
|
|
|
# ONNX GPU Runtime with O4 for BAAI/bge-reranker-large |
|
|
|
benchmark: https://colab.research.google.com/drive/1HP9GQKdzYa6H9SJnAZoxJWq920gxwd2k |
|
|
|
## Convert |
|
|
|
```bash |
|
!optimum-cli export onnx -m BAAI/bge-reranker-large --optimize O4 bge-reranker-large-onnx-o4 --device cuda |
|
``` |
|
|
|
## Usage |
|
|
|
```python |
|
# pip install "optimum[onnxruntime-gpu]" transformers |
|
|
|
from optimum.onnxruntime import ORTModelForSequenceClassification |
|
from transformers import AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained('swulling/bge-reranker-large-onnx-o4') |
|
model = ORTModelForSequenceClassification.from_pretrained('swulling/bge-reranker-large-onnx-o4') |
|
model.to("cuda") |
|
|
|
with torch.no_grad(): |
|
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512) |
|
scores = model(**inputs, return_dict=True).logits.view(-1, ).float() |
|
print(scores) |
|
``` |
|
|
|
## Source model |
|
|
|
https://huggingface.co/BAAI/bge-reranker-large |
|
|
|
|