Evasion Detection — AI Text Detection Evasion Research

Audit de sécurité des détecteurs de texte IA.
Système de réécriture qui rend le texte généré par IA indétectable, basé sur les papiers MASH, HIP, et CoPA.

🎯 Objectif

Comprendre et exploiter les faiblesses mathématiques des détecteurs de texte IA (Fast-DetectGPT, Binoculars, GPTZero, Pangram) pour construire un système de réécriture capable de les contourner.

Intuition clé : Les détecteurs exploitent la faible dispersion des tokens du texte IA. Le texte humain a plus de variance dans le choix des mots, la longueur des phrases, et la distribution de fréquence. Si on augmente cette dispersion, on devient invisible.

📊 Architecture

┌──────────────────────────────────────────────────────────────┐
│                    PIPELINE D'ÉVASION                         │
├──────────────────────────────────────────────────────────────┤
│                                                                │
│  ENTRAÎNEMENT (Modal A100 80GB, ~40€)                         │
│  ┌──────────────────────────────────────────────────────┐    │
│  │ Phase 1: Style-SFT (~20€)                             │    │
│  │ BART-large (406M) + Style Embeddings                  │    │
│  │ Dataset: HC3 — 5K paires AI→Human (finance, medicine, │    │
│  │           open_qa, wiki_csai)                         │    │
│  └──────────────────────────────────────────────────────┘    │
│                        ↓                                       │
│  ┌──────────────────────────────────────────────────────┐    │
│  │ Phase 2: DPO Adversarial (~21€)                       │    │
│  │ Reward = -score_détecteur                             │    │
│  │ β=0.1, hard negative mining                          │    │
│  └──────────────────────────────────────────────────────┘    │
│                                                                │
│  INFÉRENCE (Modal T4, ~0.60€/h)                              │
│  ┌──────────────────────────────────────────────────────┐    │
│  │ Stage 1: BART-SFT-DPO → Rewrite AI→Human              │    │
│  │ Stage 2: CoPA λ=1.5 → Token dispersion boost          │    │
│  │ P_final = (1+λ)·log P_human - λ·log P_machine         │    │
│  │ + top-p=0.92 + rep_penalty=1.15 + diversity bonus    │    │
│  └──────────────────────────────────────────────────────┘    │
│                                                                │
│  ÉVALUATION                                                   │
│  ┌──────────────────────────────────────────────────────┐    │
│  │ Fast-DetectGPT | Binoculars | GPTZero | Pangram      │    │
│  │ Métriques: ASR, BERTScore, PPL, Token Dispersion     │    │
│  └──────────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────┘

📁 Structure du repo

evasion-detection-artifacts/
├── README.md                          ← Ce fichier
├── datasets/
│   ├── style_transfer_pairs_train.jsonl   ← 4500 paires AI→Human (HC3)
│   ├── style_transfer_pairs_val.jsonl     ← 500 paires validation
│   └── bitcoin_text.txt                   ← Texte test Bitcoin
├── results/
│   ├── copa_qwen2.5_1.5b_gpu_*.json      ← Résultats CoPA GPU (Qwen2.5)
│   └── copa_real_distilgpt2.json          ← Résultats CoPA CPU (baseline)
├── reports/
│   ├── eval_statistical_qwen25_1.5b_*.json ← Évaluation statistique
│   └── EVASION_DETECTION_REPORT.md         ← Rapport complet
└── src/
    ├── inference_copa.py                   ← CoPA contrastive decoding
    ├── inference_combined.py               ← BART + CoPA two-stage
    ├── modal_app_copa.py                   ← Modal GPU wrapper CoPA
    ├── modal_app_sft.py                    ← Modal GPU Style-SFT training
    ├── modal_app_dpo.py                    ← Modal GPU DPO training
    ├── evaluate_detectors.py               ← Multi-detector evaluation
    ├── eval_statistical.py                 ← Statistical dispersion analysis
    ├── train_sft_modal.py                  ← Style-SFT (local/Modal)
    ├── hf_upload.py                        ← HF artifact upload adapter
    └── cost_guard.py                       ← Modal budget guard

🔬 Résultats

Prototype CoPA (Qwen2.5-1.5B, T4, 0.05€)

Métrique	Original (IA)	Réécrit (CoPA)	Δ
Word freq dispersion	0.36	1.68	+366%
Sentence length CV	0.154	0.327	+113%
Readability (Flesch)	25	32	+28%
Human-likeness	0.500	0.548	+0.048

Exemple Bitcoin (CoPA v2 — few-shot, λ=1.5)

Original (AI-style):

Bitcoin, often called BTC, is the first and most well-known cryptocurrency in the world. It was created in 2009 by an unknown person or group using the name Satoshi Nakamoto...

Réécrit (CoPA Human-style):

Hey there! Bitcoin, or BTC, is the big bang of all the cool cyber currencies. It was born in 2009. An unknown dude or gals go by the name Satoshi Nakamoto. Unlike the money that grown-ups have, Bitcoin isn't controlled by anyone, or a central bank. Coincidence? Not a bit!...

🚀 Utilisation

Quick start — Rewrite a text (Modal GPU)

# Single text
modal run -q src/modal_app_copa.py --text "Your AI-generated text here" --gpu T4

# From file
modal run -q src/modal_app_copa.py --text-file data/bitcoin_text.txt --gpu T4

# Batch (10 samples, synthetic templates)
modal run -q src/modal_app_copa.py --num-samples 10 --gpu T4

Style-SFT Training (Modal A100 80GB, ~20€)

# Dry-run first (validates pipeline, ~0€)
modal run src/modal_app_sft.py --dry-run

# Real training (6-8h, ~20€)
modal run src/modal_app_sft.py --data datasets/style_transfer_pairs_train.jsonl

DPO Adversarial Training (Modal A100 80GB, ~21€)

modal run src/modal_app_dpo.py --sft-model simonlesaumon/evasion-detection-models/bart-sft-style-humanization

Statistical Evaluation (local, no GPU)

# Analyze any CoPA output
python src/eval_statistical.py output/copa_modal_results.json output/eval_report.json

Run Tests (13 unit tests, 0 GPU)

pytest tests/test_inference.py -v

📚 Base théorique

Papiers fondateurs

Papier	Conférence	Contribution clé
MASH (2025)	arXiv:2601.08564	BART-base 139M + Style-SFT + DPO = 92% ASR
HIP (2026)	CMU	Modèles base = 96.7% "humains" (GPTZero)
CoPA (2025)	EMNLP	Contrastive decoding training-free
Fast-DetectGPT (2024)	ICLR	Courbure de probabilité, 340x plus rapide
Binoculars (2024)	ICML	Cross-perplexité, >90% TPR @ 0.01% FPR
Pangram (2025)	COLING	Mistral NeMo 12B + active learning

Comment marchent les détecteurs

Famille	Principe	Exemple
Statistique	Perplexité + burstiness	GPTZero
Courbure	Score = LogP - E[LogP]	Fast-DetectGPT
Cross-PPL	Ratio perplexité 2 modèles	Binoculars
Watermark	Signature dans les tokens	SynthID-Text

Ce qu'ils ont en commun : Le texte IA a des tokens groupés en zones de haute probabilité → faible dispersion. Notre approche maximise cette dispersion.

🛡️ Éthique

Ce projet est une recherche en sécurité défensive.

✅ Audit de détecteurs — comprendre leurs faiblesses pour les améliorer
✅ Tous les outputs sont labellisés comme artefacts de recherche
❌ Pas d'API publique d'évasion
❌ Pas de produit "undetectable AI"
⚠️ Usage académique et éducatif uniquement

📊 Budget

Phase	GPU	Coût
CoPA prototype	T4	~0.30€
Dataset HC3	CPU	0€
Style-SFT	A100 80GB	~20€
DPO adversarial	A100 80GB	~21€
Inférence combinée	T4	~1.20€
Évaluation	A100 80GB	~7.50€
Ablations	A100 80GB	~50€
Total		~100€

🔗 Liens

Code source: simonlesaumon/evasion-detection (GitHub)
Artifacts: simonlesaumon/evasion-detection-artifacts
Models: simonlesaumon/evasion-detection-models

Built with Modal, PyTorch, HuggingFace Transformers. Budget: 200€ Modal credits.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for simonlesaumon/evasion-detection-artifacts

MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

Paper • 2601.08564 • Published Apr 19