You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card

Overview

This model is a fine-tuned version of Meta's Llama-2-7B model, specifically trained for hallucination detection.

Task

This model performs token classification, where each token in a sentence is classified as either:

  • correct (0): The token is part of factual information.
  • hallucinated (1): The token is part of hallucinated or incorrect information.

Example Usage:

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load the model and tokenizer
model_name = "nicksnlp/llama-7B-hallucination"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

def infer_with_model(input_text):
    # Tokenize the input text
    inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=128)

    # Move input tensors to the same device as the model
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    # Predict the token labels (hallucination vs. correct)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits  # Raw logits output from the model

    # Get the predicted labels (0 for correct, 1 for hallucinated)
    predicted_labels = torch.argmax(logits, dim=-1)

    # Decode the tokens from the input text
    tokens = tokenizer.tokenize(input_text)

    # Get the corresponding predicted labels for each token
    labeled_tokens = list(zip(tokens, predicted_labels[0].tolist()))

    # Create a list of hallucinated words
    hallucinated_words = [token for token, label in labeled_tokens if label == 1]

    return hallucinated_words


# Example input
input_text = "Alexanderplatz is located in London City, it has been there since 1966."
hallucinated_words = infer_with_model(input_text)

print("Hallucinated words:", hallucinated_words)
print(list(tokenizer.decode(tokenizer.convert_tokens_to_ids(word)) for word in hallucinated_words))

Training Data:

The model was trained on a small dataset with labeled examples of correct and hallucinated tokens. A few examples from the dataset:

[
  {"text": "The Eiffel Tower is located in Berlin, Germany.", "labels": [0, 0, 0, 0, 0, 0, 1, 1]},  # Hallucinated words: "Berlin", "Germany"
  {"text": "The capital of France is Paris.", "labels": [0, 0, 0, 0, 0, 0]},  # Correct sentence
  {"text": "The Amazon River flows through Asia.", "labels": [0, 0, 0, 0, 0, 1]}  # Hallucinated word: "Asia"
]

Model Details

  • Base Model: Llama-2-7B
  • Task: Token Classification
  • Labels:
    • correct (0)
    • hallucinated (1)

Training Parameters

  • Model Name: nicksnlp/llama-7B-hallucination
  • Base Model: meta-llama/Llama-2-7b-hf
  • Task: Token Classification
  • Batch Size: 8
  • Epochs: 1
  • Learning Rate: 2e-4
  • Optimizer: paged_adamw_8bit
  • Gradient Accumulation Steps: 2
  • Max Sequence Length: 128
  • Weight Decay: 0.001
  • Warmup Ratio: 0.3
  • Save Steps: 300
  • Logging Steps: 10
  • Max Gradient Norm: 0.3
  • FP16: false
  • BF16: false
  • Device: GPU

PEFT Configuration

  • LoRA Alpha: 8
  • LoRA Dropout: 0.1
  • Rank (r): 16
  • Bias: "none"
  • Target Modules: ["q_proj", "v_proj"]

Training Framework

  • LoRA (Low-Rank Adaptation)

Citation

For the original Llama model:

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and others},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

For the fine-tuned version:

@misc{nicksnlp2024llama_hallucination,
  author = {Nikolay Vorontsov},
  title = {Fine-tuning Llama-2-7B for Hallucination Detection},
  year = {2024},
  url = {https://huggingface.co/nicksnlp/llama-7B-hallucination}
}
Downloads last month
14
Safetensors
Model size
3.47B params
Tensor type
F32
FP16
U8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nicksnlp/llama-7B-hallucination

Quantized
(41)
this model