Let’s test what you learned in this chapter!
1. When should you train a new tokenizer?
2. What is the advantage of using a generator of lists of texts compared to a list of lists of texts when using
3. What are the advantages of using a “fast” tokenizer?
4. How does the
token-classification pipeline handle entities that span over several tokens?
5. How does the
question-answering pipeline handle long contexts?
6. What is normalization?
7. What is pre-tokenization for a subword tokenizer?
8. Select the sentences that apply to the BPE model of tokenization.
9. Select the sentences that apply to the WordPiece model of tokenization.
10. Select the sentences that apply to the Unigram model of tokenization.