Edit model card

We have trained distil bert on this dataset [https://huggingface.co/datasets/nothingiisreal/Human_Stories]

It's kinda okay for sampling, but needs improvements and exposure to more synthetic data and types of mistakes LLMs do.

Overall I'm extremely impressed with how well this 68 million parameter model works, and extremely disappointed with how every single AI is getting picked up after only training BERT on GPT3.5 rows of the data.

Class label 0 means human, 1 means AI.

We tested these models all of which worked:

GPT3.5, 4, 4o

Claude Sonnet, Opus

Wizard LM 2

Gemini 1.5 Pro

It's really blatant how every single AI company is using the same watermark whether knowingly or unknowingly (through LLM "incest")

Downloads last month
136
Safetensors
Model size
67M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.