Let’s test what you learned in this chapter!

1. Which of the following tasks can be framed as a token classification problem?

2. What part of the preprocessing for token classification differs from the other preprocessing pipelines?

3. What problem arises when we tokenize the words in a token classification problem and want to label the tokens?

4. What does “domain adaptation” mean?

5. What are the labels in a masked language modeling problem?

6. Which of these tasks can be seen as a sequence-to-sequence problem?

7. What is the proper way to preprocess the data for a sequence-to-sequence problem?

8. Why is there a specific subclass of Trainer for sequence-to-sequence problems?

10. When should you pretrain a new model?

11. Why is it easy to pretrain a language model on lots and lots of texts?

12. What are the main challenges when preprocessing data for a question answering task?

13. How is post-processing usually done in question answering?