GIST-small-markov-slop-detector

This BERT-based classifier is trained to distinguish coherent human-written text from text generated by a Markov chain.

As expected, the classifier achieves near-perfect performance (98.2% accuracy on evaluation set), largely because BERT’s attention mechanism captures long-range contextual dependencies, whereas a Markov model relies only on the previous state.

Dataset

Class distribution of the training dataset:

Label Train Test Total
markov 7998 2000 9998
real 8000 2000 10000
Total 15998 4000 19998

Model Specification

  • Model type: bert
  • Problem Type: single_label_classification
  • Number of Labels: 2
  • Vocabulary Size: 30522
  • License: MIT

Use

To get started with this model in Python using the Hugging Face Transformers library, run the following code:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "agentlans/GIST-small-markov-slop-detector"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "Replace this with your input text."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class_id = logits.argmax().item()
predicted_class_name = model.config.id2label[predicted_class_id]

print(f"Predicted Class ID: {predicted_class_id}")
print(f"Predicted Class Name: {predicted_class_name}")

Intended Uses & Limitations

Intended Use

This model is designed for sequence classification tasks. Below are the specific class labels mapped to their corresponding IDs:

Label ID Label Name
0 markov
1 real

Training Details

Hyperparameters

The following hyperparameters were used during fine-tuning:

  • Learning Rate: 5e-05
  • Train Batch Size: 8
  • Eval Batch Size: 8
  • Optimizer: OptimizerNames.ADAMW_TORCH_FUSED
  • Number of Epochs: 3.0
  • Mixed Precision: BF16
Show Advanced Training Configuration

Optimization & Regularization

  • Gradient Accumulation Steps: 1
  • Learning Rate Scheduler: SchedulerType.LINEAR
  • Warmup Steps: 0
  • Warmup Ratio: None
  • Weight Decay: 0.0
  • Max Gradient Norm: 1.0

Hardware & Reproducibility

  • Number of GPUs: 1
  • Seed: 42

Training Results & Evaluation

During fine-tuning, the model achieved the following results on the evaluation set:

Metric Value
Train Loss 0.0593
Validation Loss 0.0693
Validation F1 Score N/A
Total FLOPs 7.9037e+14

Speed Performance

  • Training Runtime: 106.2373 seconds
  • Train Samples per Second: 451.762
  • Evaluation Runtime: 3.2116 seconds
  • Eval Samples per Second: 1245.467
Show Detailed Training Logs

Training Logs History

Step Epoch Learning Rate Training Loss Validation Loss Validation F1
500 0.25 4.5842e-05 0.1758 N/A N/A
1000 0.5 4.1675e-05 0.1194 N/A N/A
1500 0.75 3.7508e-05 0.1157 N/A N/A
2000 1.0 3.3342e-05 0.0829 0.0693 N/A
2500 1.25 2.9175e-05 0.0405 N/A N/A
3000 1.5 2.5008e-05 0.0334 N/A N/A
3500 1.75 2.0842e-05 0.0464 N/A N/A
4000 2.0 1.6675e-05 0.0412 0.0949 N/A
4500 2.25 1.2508e-05 0.0113 N/A N/A
5000 2.5 8.3417e-06 0.0099 N/A N/A
5500 2.75 4.1750e-06 0.0188 N/A N/A
6000 3.0 8.3333e-09 0.0159 0.0898 N/A

Framework Versions

  • Transformers: 5.0.0.dev0
  • PyTorch: 2.9.1+cu128
Downloads last month
32
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentlans/GIST-small-markov-slop-detector

Finetuned
(19)
this model

Datasets used to train agentlans/GIST-small-markov-slop-detector

Evaluation results