Instructions to use grounded-ai/phi3-rag-relevance-judge-merge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use grounded-ai/phi3-rag-relevance-judge-merge with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="grounded-ai/phi3-rag-relevance-judge-merge", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("grounded-ai/phi3-rag-relevance-judge-merge", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("grounded-ai/phi3-rag-relevance-judge-merge", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use grounded-ai/phi3-rag-relevance-judge-merge with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "grounded-ai/phi3-rag-relevance-judge-merge" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "grounded-ai/phi3-rag-relevance-judge-merge", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/grounded-ai/phi3-rag-relevance-judge-merge
- SGLang
How to use grounded-ai/phi3-rag-relevance-judge-merge with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "grounded-ai/phi3-rag-relevance-judge-merge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "grounded-ai/phi3-rag-relevance-judge-merge", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "grounded-ai/phi3-rag-relevance-judge-merge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "grounded-ai/phi3-rag-relevance-judge-merge", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use grounded-ai/phi3-rag-relevance-judge-merge with Docker Model Runner:
docker model run hf.co/grounded-ai/phi3-rag-relevance-judge-merge
Merged Model Performance
This repository contains the results of our merged rag relevance PEFT adapter model.
RAG Relevance Classification Metrics
Our merged model achieves the following performance on a binary classification task:
precision recall f1-score support
0 0.74 0.77 0.75 100
1 0.76 0.73 0.74 100
accuracy 0.75 200
macro avg 0.75 0.75 0.75 200
weighted avg 0.75 0.75 0.75 200
Model Usage
For best results, we recommend starting with the following prompting strategy (and encourage tweaks as you see fit):
def format_input_classification(query, text):
input = f"""
You are comparing a reference text to a question and trying to determine if the reference text
contains information relevant to answering the question. Here is the data:
[BEGIN DATA]
************
[Question]: {query}
************
[Reference text]: {text}
************
[END DATA]
Compare the Question above to the Reference text. You must determine whether the Reference text
contains information that can answer the Question. Please focus on whether the very specific
question can be answered by the information in the Reference text.
Your response must be single word, either "relevant" or "unrelated",
and should not contain any text or characters aside from that word.
"unrelated" means that the reference text does not contain an answer to the Question.
"relevant" means the reference text contains an answer to the Question."""
return input
text = format_input_classification("What is quanitzation?",
"Quantization is a method to reduce the memory footprint")
messages = [
{"role": "user", "content": text}
]
pipe = pipeline(
"text-generation",
model=base_model,
model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": torch.float16},
tokenizer=tokenizer,
)
Comparison with Other Models
We compared our merged model's performance on the RAG Eval benchmark against several other state-of-the-art language models:
| Model | Precision | Recall | F1 |
|---|---|---|---|
| Our Merged Model | 0.74 | 0.77 | 0.75 |
| GPT-4 | 0.70 | 0.88 | 0.78 |
| GPT-4 Turbo | 0.68 | 0.91 | 0.78 |
| Gemini Pro | 0.61 | 1.00 | 0.76 |
| GPT-3.5 | 0.42 | 1.00 | 0.59 |
| Palm (Text Bison) | 0.53 | 1.00 | 0.69 |
| [1] Scores from arize/phoenix |
As shown in the table, our merged model achieves a comparable score of 0.75, outperforming several other black box models.
We will continue to improve and fine-tune our merged model to achieve even better performance across various benchmarks and tasks.
Citations: [1] https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals/retrieval-rag-relevance
- Downloads last month
- 4