Evasion Detection β AI Text Detection Evasion Research
Audit de sΓ©curitΓ© des dΓ©tecteurs de texte IA.
Système de réécriture qui rend le texte généré par IA indétectable, basé sur les papiers MASH, HIP, et CoPA.
π― Objectif
Comprendre et exploiter les faiblesses mathématiques des détecteurs de texte IA (Fast-DetectGPT, Binoculars, GPTZero, Pangram) pour construire un système de réécriture capable de les contourner.
Intuition clΓ© : Les dΓ©tecteurs exploitent la faible dispersion des tokens du texte IA. Le texte humain a plus de variance dans le choix des mots, la longueur des phrases, et la distribution de frΓ©quence. Si on augmente cette dispersion, on devient invisible.
π Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PIPELINE D'ΓVASION β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ENTRAΓNEMENT (Modal A100 80GB, ~40β¬) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Phase 1: Style-SFT (~20β¬) β β
β β BART-large (406M) + Style Embeddings β β
β β Dataset: HC3 β 5K paires AIβHuman (finance, medicine, β β
β β open_qa, wiki_csai) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Phase 2: DPO Adversarial (~21β¬) β β
β β Reward = -score_dΓ©tecteur β β
β β Ξ²=0.1, hard negative mining β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β INFΓRENCE (Modal T4, ~0.60β¬/h) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Stage 1: BART-SFT-DPO β Rewrite AIβHuman β β
β β Stage 2: CoPA Ξ»=1.5 β Token dispersion boost β β
β β P_final = (1+Ξ»)Β·log P_human - λ·log P_machine β β
β β + top-p=0.92 + rep_penalty=1.15 + diversity bonus β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ΓVALUATION β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Fast-DetectGPT | Binoculars | GPTZero | Pangram β β
β β MΓ©triques: ASR, BERTScore, PPL, Token Dispersion β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Structure du repo
evasion-detection-artifacts/
βββ README.md β Ce fichier
βββ datasets/
β βββ style_transfer_pairs_train.jsonl β 4500 paires AIβHuman (HC3)
β βββ style_transfer_pairs_val.jsonl β 500 paires validation
β βββ bitcoin_text.txt β Texte test Bitcoin
βββ results/
β βββ copa_qwen2.5_1.5b_gpu_*.json β RΓ©sultats CoPA GPU (Qwen2.5)
β βββ copa_real_distilgpt2.json β RΓ©sultats CoPA CPU (baseline)
βββ reports/
β βββ eval_statistical_qwen25_1.5b_*.json β Γvaluation statistique
β βββ EVASION_DETECTION_REPORT.md β Rapport complet
βββ src/
βββ inference_copa.py β CoPA contrastive decoding
βββ inference_combined.py β BART + CoPA two-stage
βββ modal_app_copa.py β Modal GPU wrapper CoPA
βββ modal_app_sft.py β Modal GPU Style-SFT training
βββ modal_app_dpo.py β Modal GPU DPO training
βββ evaluate_detectors.py β Multi-detector evaluation
βββ eval_statistical.py β Statistical dispersion analysis
βββ train_sft_modal.py β Style-SFT (local/Modal)
βββ hf_upload.py β HF artifact upload adapter
βββ cost_guard.py β Modal budget guard
π¬ RΓ©sultats
Prototype CoPA (Qwen2.5-1.5B, T4, 0.05β¬)
| Métrique | Original (IA) | Réécrit (CoPA) | Π|
|---|---|---|---|
| Word freq dispersion | 0.36 | 1.68 | +366% |
| Sentence length CV | 0.154 | 0.327 | +113% |
| Readability (Flesch) | 25 | 32 | +28% |
| Human-likeness | 0.500 | 0.548 | +0.048 |
Exemple Bitcoin (CoPA v2 β few-shot, Ξ»=1.5)
Original (AI-style):
Bitcoin, often called BTC, is the first and most well-known cryptocurrency in the world. It was created in 2009 by an unknown person or group using the name Satoshi Nakamoto...
Réécrit (CoPA Human-style):
Hey there! Bitcoin, or BTC, is the big bang of all the cool cyber currencies. It was born in 2009. An unknown dude or gals go by the name Satoshi Nakamoto. Unlike the money that grown-ups have, Bitcoin isn't controlled by anyone, or a central bank. Coincidence? Not a bit!...
π Utilisation
Quick start β Rewrite a text (Modal GPU)
# Single text
modal run -q src/modal_app_copa.py --text "Your AI-generated text here" --gpu T4
# From file
modal run -q src/modal_app_copa.py --text-file data/bitcoin_text.txt --gpu T4
# Batch (10 samples, synthetic templates)
modal run -q src/modal_app_copa.py --num-samples 10 --gpu T4
Style-SFT Training (Modal A100 80GB, ~20β¬)
# Dry-run first (validates pipeline, ~0β¬)
modal run src/modal_app_sft.py --dry-run
# Real training (6-8h, ~20β¬)
modal run src/modal_app_sft.py --data datasets/style_transfer_pairs_train.jsonl
DPO Adversarial Training (Modal A100 80GB, ~21β¬)
modal run src/modal_app_dpo.py --sft-model simonlesaumon/evasion-detection-models/bart-sft-style-humanization
Statistical Evaluation (local, no GPU)
# Analyze any CoPA output
python src/eval_statistical.py output/copa_modal_results.json output/eval_report.json
Run Tests (13 unit tests, 0 GPU)
pytest tests/test_inference.py -v
π Base thΓ©orique
Papiers fondateurs
| Papier | ConfΓ©rence | Contribution clΓ© |
|---|---|---|
| MASH (2025) | arXiv:2601.08564 | BART-base 139M + Style-SFT + DPO = 92% ASR |
| HIP (2026) | CMU | Modèles base = 96.7% "humains" (GPTZero) |
| CoPA (2025) | EMNLP | Contrastive decoding training-free |
| Fast-DetectGPT (2024) | ICLR | Courbure de probabilitΓ©, 340x plus rapide |
| Binoculars (2024) | ICML | Cross-perplexitΓ©, >90% TPR @ 0.01% FPR |
| Pangram (2025) | COLING | Mistral NeMo 12B + active learning |
Comment marchent les dΓ©tecteurs
| Famille | Principe | Exemple |
|---|---|---|
| Statistique | PerplexitΓ© + burstiness | GPTZero |
| Courbure | Score = LogP - E[LogP] | Fast-DetectGPT |
| Cross-PPL | Ratio perplexité 2 modèles | Binoculars |
| Watermark | Signature dans les tokens | SynthID-Text |
Ce qu'ils ont en commun : Le texte IA a des tokens groupΓ©s en zones de haute probabilitΓ© β faible dispersion. Notre approche maximise cette dispersion.
π‘οΈ Γthique
Ce projet est une recherche en sΓ©curitΓ© dΓ©fensive.
- β Audit de dΓ©tecteurs β comprendre leurs faiblesses pour les amΓ©liorer
- β Tous les outputs sont labellisΓ©s comme artefacts de recherche
- β Pas d'API publique d'Γ©vasion
- β Pas de produit "undetectable AI"
- β οΈ Usage acadΓ©mique et Γ©ducatif uniquement
π Budget
| Phase | GPU | CoΓ»t |
|---|---|---|
| CoPA prototype | T4 | ~0.30β¬ |
| Dataset HC3 | CPU | 0β¬ |
| Style-SFT | A100 80GB | ~20β¬ |
| DPO adversarial | A100 80GB | ~21β¬ |
| InfΓ©rence combinΓ©e | T4 | ~1.20β¬ |
| Γvaluation | A100 80GB | ~7.50β¬ |
| Ablations | A100 80GB | ~50β¬ |
| Total | ~100β¬ |
π Liens
- Code source:
simonlesaumon/evasion-detection(GitHub) - Artifacts:
simonlesaumon/evasion-detection-artifacts - Models:
simonlesaumon/evasion-detection-models
Built with Modal, PyTorch, HuggingFace Transformers. Budget: 200β¬ Modal credits.