Post
95
published a small source-backed dataset for reviewing AI-assisted code and AI-written English without turning it into an accusation game.
Dataset: yava-code/ai-authorship-signals-2026
The dataset has 10 review signals across two domains:
code: comment-to-code ratio, dependency hallucination, security misses, edge cases
writing: overused AI vocabulary, low section variation, detector bias against non-native English
Each row includes:
signal
why it matters
risk level
review action
source ids
The main idea: do not ask "was this made by AI?" first. Ask what needs review, what evidence exists, and what failure mode would hurt production.
I also grouped the related work here:
yava-code/applied-small-ai-portfolio-6a304c83f9f1d089a28c101b
Dataset: yava-code/ai-authorship-signals-2026
The dataset has 10 review signals across two domains:
code: comment-to-code ratio, dependency hallucination, security misses, edge cases
writing: overused AI vocabulary, low section variation, detector bias against non-native English
Each row includes:
signal
why it matters
risk level
review action
source ids
The main idea: do not ask "was this made by AI?" first. Ask what needs review, what evidence exists, and what failure mode would hurt production.
I also grouped the related work here:
yava-code/applied-small-ai-portfolio-6a304c83f9f1d089a28c101b