The underlying data set is flawed
#1
by
OliP
- opened
Just wanted to point out this analysis: https://www.kaggle.com/code/josutk/only-one-word-99-2
Biases in several direction. In particular a classifier soley based on whether "reuters" is included in the text gives 99% accurarcy.