ONNX
English

Off-Topic Classification Model

This repository contains a fine-tuned Jina Embeddings model designed to perform binary classification. The model predicts whether a user prompt is off-topic based on the intended purpose defined in the system prompt.

Model Highlights

Performance

We evaluated our fine-tuned models on synthetic data modelling system and user prompt pairs reflecting real world enterprise use cases of LLMs. The dataset is available here.

Approach Model ROC-AUC F1 Precision Recall
Fine-tuned bi-encoder classifier jina-embeddings-v2-small-en 0.99 0.97 0.99 0.95
πŸ‘‰ Fine-tuned cross-encoder classifier stsb-roberta-base 0.99 0.99 0.99 0.99
Pre-trained cross-encoder stsb-roberta-base 0.73 0.68 0.53 0.93
Prompt Engineering GPT 4o (2024-08-06) - 0.95 0.94 0.97
Prompt Engineering GPT 4o Mini (2024-07-18) - 0.91 0.85 0.91
Zero-shot Classification GPT 4o Mini (2024-07-18) 0.99 0.97 0.95 0.99

Further evaluation results on additional synthetic and external datasets (e.g.,JailbreakBench, HarmBench, TrustLLM) are available in our technical report.

Usage

  1. Clone this repository and install the required dependencies:

    pip install -r requirements.txt
    
  2. You can run the model using two options:

    Option 1: Using inference_onnx.py with the ONNX Model.

     ```
     python inference_onnx.py '[
         ["System prompt example 1", "User prompt example 1"],
         ["System prompt example 2", "System prompt example 2]
     ]'
     ```
    

    Option 2: Using inference_safetensors.py with PyTorch and SafeTensors.

     ```
     python inference_safetensors.py '[
         ["System prompt example 1", "User prompt example 1"],
         ["System prompt example 2", "System prompt example 2]
     ]'
     ```
    

Read more about this model in our technical report.

Downloads last month
73
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for govtech/jina-embeddings-v2-small-en-off-topic

Quantized
(5)
this model

Dataset used to train govtech/jina-embeddings-v2-small-en-off-topic

Space using govtech/jina-embeddings-v2-small-en-off-topic 1

Collection including govtech/jina-embeddings-v2-small-en-off-topic