The dataset used in this example is, [The 20 newsgroups text dataset](https://scikit-learn.org/stable/datasets/real_world.html#newsgroups-dataset) which will be automatically downloaded, cached and reused for the document classification example.