File size: 3,560 Bytes
0b86948
 
05bee55
f164ec3
 
 
 
 
05bee55
1dd0168
05bee55
 
0b86948
 
 
 
d88ac0a
f164ec3
0b86948
 
 
 
f164ec3
 
 
d9decd5
1dd0168
0b86948
d88ac0a
0b86948
 
 
231c50c
 
0b86948
f164ec3
d88ac0a
0b86948
f164ec3
0b86948
f164ec3
 
 
0b86948
f164ec3
 
 
0b86948
f164ec3
 
 
0b86948
f164ec3
 
 
0b86948
f164ec3
0b86948
f164ec3
 
 
 
0b86948
f164ec3
0b86948
0747f1a
 
 
 
 
97997be
 
 
 
 
 
 
 
 
 
 
0747f1a
 
 
 
 
97997be
 
 
 
0747f1a
97997be
0747f1a
 
 
0b86948
97997be
0b86948
 
 
658be44
0b86948
f36386a
0b86948
d9decd5
 
 
 
 
 
 
 
 
 
 
0b86948
 
f164ec3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
library_name: transformers
tags:
- text-generation
- pytorch
- Lynx
- Patronus AI
- evaluation
- hallucination-detection
license: cc-by-nc-4.0
language:
- en
---

# Model Card for Model ID

Lynx is an open-source hallucination evaluation model. Patronus-Lynx-8B-Instruct was trained on a mix of datasets including CovidQA, PubmedQA, DROP, RAGTruth.
The datasets contain a mix of hand-annotated and synthetic data. The maximum sequence length is 8000 tokens. 


## Model Details

- **Model Type:** Patronus-Lynx-8B-Instruct is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct model.
- **Language:** Primarily English
- **Developed by:** Patronus AI
- **Paper:** [https://arxiv.org/abs/2407.08488](https://arxiv.org/abs/2407.08488)
- **License:** [https://creativecommons.org/licenses/by-nc/4.0/](https://creativecommons.org/licenses/by-nc/4.0/)

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [https://github.com/patronus-ai/Lynx-hallucination-detection](https://github.com/patronus-ai/Lynx-hallucination-detection)


## How to Get Started with the Model
The model is fine-tuned to be used to detect hallucinations in a RAG setting. Provided a document, question and answer, the model can evaluate whether the answer is faithful to the document.

To use the model, we recommend using the prompt we used for fine-tuning:

```
PROMPT = """
Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whether it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: "PASS" if the answer is faithful to the DOCUMENT and "FAIL" if the answer is not faithful to the DOCUMENT. Show your reasoning.

--
QUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):
{question}

--
DOCUMENT:
{context}

--
ANSWER:
{answer}

--

Your output should be in JSON FORMAT with the keys "REASONING" and "SCORE":
{{"REASONING": <your reasoning as bullet points>, "SCORE": <your final score>}}
"""
```

The model will output the score as 'PASS' if the answer is faithful to the document or FAIL if the answer is not faithful to the document. 

## Inference

To run inference, you can use HF pipeline:

```
import transformers

model_id = "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    max_new_tokens=600,
    device="cuda",
    eturn_full_text=False
)

messages = [
    {"role": "user", "content": prompt},
]

outputs = pipeline(
    messages,
    temperature=0
)

print(outputs[0]["generated_text"])
```

Since the model is trained in chat format, ensure that you pass the prompt as a user message.

For more information on training details, refer to our [ArXiv paper](https://arxiv.org/abs/2407.08488).

## Evaluation

The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/datasets/PatronusAI/HaluBench).

It outperforms GPT-3.5-Turbo, GPT-4-Turbo, GPT-4o and Claude-3-Sonnet. 

## Citation
If you are using the model, cite using

```
@article{ravi2024lynx,
  title={Lynx: An Open Source Hallucination Evaluation Model},
  author={Ravi, Selvan Sunitha and Mielczarek, Bartosz and Kannappan, Anand and Kiela, Douwe and Qian, Rebecca},
  journal={arXiv preprint arXiv:2407.08488},
  year={2024}
}
```

## Model Card Contact
[@sunitha-ravi](https://huggingface.co/sunitha-ravi)