oumi-ai/HallOumi-8B-classifier

Introducing HallOumi-8B-classifier, a fast SOTA hallucination detection model, outperforming DeepSeek R1, OpenAI o1, Google Gemini 1.5 Pro, and Claude Sonnet 3.5 at only 8 billion parameters!

Give HallOumi a try now!

Model	Macro F1 Score	Open?	Model Size
HallOumi-8B	77.2% ± 2.2%	Truly Open Source	8B
Claude Sonnet 3.5	69.6% ± 2.8%	Closed	??
OpenAI o1-preview	65.9% ± 2.3%	Closed	??
DeepSeek R1	61.6% ± 2.5%	Open Weights	671B
Llama 3.1 405B	58.8% ± 2.4%	Open Weights	405B
Google Gemini 1.5 Pro	48.2% ± 1.8%	Closed	??

HallOumi-8B-classifier, the hallucination classification model built with Oumi, is an end-to-end binary classification system that enables fast and accurate assessment of the hallucination probability of any written content (AI or human-generated).

✔️ Fast with high accuracy
✔️ Per-claim support (must call once per claim)

Hallucinations

Hallucinations are often cited as the most important issue with being able to deploy generative models in numerous commercial and personal applications, and for good reason:

It ultimately comes down to an issue of trust — generative models are trained to produce outputs which are probabilistically likely, but not necessarily true. While such tools are useful in the right hands, being unable to trust them prevents AI from being adopted more broadly, where it can be utilized safely and responsibly.

Building Trust with Verifiability

To be able to begin trusting AI systems, we have to be able to verify their outputs. To verify, we specifically mean that we need to:

Understand the truthfulness of a particular statement produced by any model (the key focus of HallOumi-8B-classifier model).
Understand what information supports that statement’s truth and have full traceability connecting the statement to that information (provided by our generative HallOumi model)

Developed by: Oumi AI
Model type: Small Language Model
Language(s) (NLP): English
License: CC-BY-NC-4.0
Finetuned from model: Llama-3.1-8B-Instruct
Demo: HallOumi Demo

Uses

Use to verify claims/detect hallucinations in scenarios where a known source of truth is available.

Demo: https://oumi.ai/halloumi-demo

Out-of-Scope Use

Smaller LLMs have limited capabilities and should be used with caution. Avoid using this model for purposes outside of claim verification.

Bias, Risks, and Limitations

This model was finetuned with Llama-3.1-405B-Instruct data on top of a Llama-3.1-8B-Instruct model, so any biases or risks associated with those models may be present.

Training Details

Training Data

Training data:

Training Procedure

For information on training, see https://oumi.ai/halloumi

Evaluation

Follow along with our notebook on how to evaluate hallucination with HallOumi and other popular models: https://github.com/oumi-ai/oumi/blob/main/configs/projects/halloumi/halloumi_eval_notebook.ipynb

Environmental Impact

Hardware Type: A100-80GB
Hours used: 1.5 (4 * 8 GPUs)
Cloud Provider: Google Cloud Platform
Compute Region: us-east5
Carbon Emitted: 0.15 kg

Citation

@misc{oumiHalloumi8BClassifier,
  author = {Panos Achlioptas, Jeremy Greer, Konstantinos Aisopos, Michael Schuler, Oussama Elachqar, Emmanouil Koukoumidis},
  title = {HallOumi-8B-classifier},
  month = {March},
  year = {2025},
  url = {https://huggingface.co/oumi-ai/HallOumi-8B-classifier}
}

@software{oumi2025,
  author = {Oumi Community},
  title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models},
  month = {January},
  year = {2025},
  url = {https://github.com/oumi-ai/oumi}
}

oumi-ai
/

HallOumi-8B-classifier