RADAR Model Card

Model Details

RADAR-Vicuna-7B is an AI-text detector trained via adversarial learning between the detector and a paraphraser on human-text corpus (OpenWebText) and AI-text corpus generated based on OpenWebText.

  • Developed by: TrustSafeAI
  • Model type: An encoder-only language model based on the transformer architecture (RoBERTa).
  • License: Non-commercial license (inherited from Vicuna-7B-v1.1)
  • Trained from model: RoBERTa

Model Sources

Uses

Users could use this detector to assist them in detecting text generated by large language models. Please note that this detector is trained on AI-text generated by Vicuna-7B-v1.1. As the model only supports non-commercial use, the intended users are not allowed to involve this detector into commercial activities.

Get Started with the Model

Please refer to the following guidelines to see how to locally run the downloaded model or use our API service hosted on Huggingface Space.

Training Pipeline

We propose adversarial learning between a paraphraser and our detector. The paraphraser's goal is to make the AI-generated text more like human-writen and the detector's goal is to promote it's ability to identify the AI-text.

  • (Step 1) Training Data preparation: Before training, we use Vicuna-7B to generate AI-text by performing text completion based on the prefix span of human-text in OpenWebText.

  • (Step 2) Update the paraphraser During training, the paraphraser will do paraphrasing on the AI-text generated in Step 1. And then collect the reward returned by the detector to update the paraphraser using Proxy Proximal Optimization loss.

  • (Step 3) Update the detector The detector is optimized using the logistic loss on the human-text, AI-text and paraphrased AI-text.

See more details in Sections 3 and 4 of this paper.

Ethical Considerations

We suggest users use our tool to assist with identifying AI-written content at scale and with discretion. If the detection result is to be used as evidence, further validation steps are necessary as RADAR cannot always make correct predictions.

Downloads last month
80,658
Inference API

Spaces using TrustSafeAI/RADAR-Vicuna-7B 3