My LoRA Fine-Tuned AI-generated Detector

This is a e5-small model fine-tuned with LoRA for sequence classification tasks. It is optimized to classify text into AI-generated or human-written with high accuracy.

  • Label_0: Represents human-written content.
  • Label_1: Represents AI-generated content.

Model Details

  • Base Model: intfloat/e5-small
  • Fine-Tuning Technique: LoRA (Low-Rank Adaptation)
  • Task: Sequence Classification
  • Use Cases: Text classification for AI-generated detection.
  • Hyperparameters:
    • Learning rate: 5e-5
    • Epochs: 3
    • LoRA rank: 8
    • LoRA alpha: 16

Training Details

  • Dataset:
    • 10,000 twitters and 10,000 rewritten twitters with GPT-4o-mini.
    • 80,000 human-written text from RAID-train.
    • 128,000 AI-generated text from RAID-train.
  • Hardware: Fine-tuned on a single NVIDIA A100 GPU.
  • Training Time: Approximately 2 hours.
  • Evaluation Metrics:
    Metric (Raw) E5-small Fine-tuned
    Accuracy 65.2% 89.0%
    F1 Score 0.653 0.887
    AUC 0.697 0.976

Collaborators

  • Menglin Zhou
  • Jiaping Liu
  • Xiaotian Zhan

Citation

If you use this model, please cite the RAID dataset as follows:

@inproceedings{dugan-etal-2024-raid,
    title = "{RAID}: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors",
    author = "Dugan, Liam  and
      Hwang, Alyssa  and
      Trhl{\'\i}k, Filip  and
      Zhu, Andrew  and
      Ludan, Josh Magnus  and
      Xu, Hainiu  and
      Ippolito, Daphne  and
      Callison-Burch, Chris",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.674",
    pages = "12463--12492",
}
Downloads last month
2,425
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for MayZhou/e5-small-lora-ai-generated-detector

Base model

intfloat/e5-small
Finetuned
(1)
this model
Quantizations
1 model

Dataset used to train MayZhou/e5-small-lora-ai-generated-detector

Space using MayZhou/e5-small-lora-ai-generated-detector 1

Evaluation results