--- language: nl tags: - token-classification - sequence-tagger-model --- # Goal This model can be used to add emoji to an input text. To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity. The accompanying demo, which includes all the pre- and postprocessing needed can be found [here](https://huggingface.co/spaces/ml6team/emoji_predictor). For the moment, this only works for Dutch texts. # Dataset For this model, we scraped about 1000 unique tweets per emoji we support: ['😨', '😥', '😍', '😠', '🤯', '😄', '🍾', '🚗', '☕', '💰'] Which could look like this: ``` Wow 😍😍, what a cool car 🚗🚗! Omg, I hate mondays 😠... I need a drink 🍾 ``` After some processing, we can reposition this in a more known NER format: | Word | Label | |-------|-----| | Wow | B-😍| | , | O | | what | O | | a | O | | cool | O | | car | O | | ! | B-🚗| Which can then be leveraged for training a token classification model. Unfortunately, Terms of Service prohibit us from sharing the original dataset. # Training The model was trained for 4 epochs.