--- license: mit --- # ONNX GPU Runtime with O4 for BAAI/bge-reranker-large benchmark: https://colab.research.google.com/drive/1HP9GQKdzYa6H9SJnAZoxJWq920gxwd2k ## Convert ```bash !optimum-cli export onnx -m BAAI/bge-reranker-large --optimize O4 bge-reranker-large-onnx-o4 --device cuda ``` ## Usage ```python # pip install "optimum[onnxruntime-gpu]" transformers from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('swulling/bge-reranker-large-onnx-o4') model = ORTModelForSequenceClassification.from_pretrained('swulling/bge-reranker-large-onnx-o4') model.to("cuda") pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']] with torch.no_grad(): inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512) scores = model(**inputs, return_dict=True).logits.view(-1, ).float() print(scores) ``` ## Source model https://huggingface.co/BAAI/bge-reranker-large