ml6team
/

xlm-roberta-base-nl-emoji-ner

Token Classification

sequence-tagger-model

Inference Endpoints

Model card Files Files and versions Community

thomasdehaene commited on Apr 5, 2022

Commit

08b02a6

•

1 Parent(s): a2c0203

Create README.md

Files changed (1) hide show

README.md +50 -0

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+---
+language: nl
+tags:
+- token-classification
+- sequence-tagger-model
+---
+# Goal
+This model can be used to add emoji to an input text.
+To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity.
+The accompanying demo, which includes all the pre- and postprocessing needed can be found [here](https://huggingface.co/spaces/ml6team/emoji_predictor).
+For the moment, this only works for Dutch texts.
+# Dataset
+For this model, we scraped about 1000 unique tweets per emoji we support:
+['😨', '😥', '😍', '😠', '🤯', '😄', '🍾', '🚗', '☕', '💰']
+Which could look like this:
+```
+Wow 😍😍, what a cool car 🚗🚗!
+Omg, I hate mondays 😠... I need a drink 🍾
+```
+After some processing, we can reposition this in a more known NER format:
+| Word | Label |
+|-------|-----|
+| Wow   | B-😍|
+| ,     | O   |
+| what  | O   |
+| a     | O   |
+| cool  | O   |
+| car   | O   |
+| !     | B-🚗|
+Which can then be leveraged for training a token classification model.
+Unfortunately, Terms of Service prohibit us from sharing the original dataset.
+# Training
+The model was trained for 4 epochs.