thomasdehaene commited on
Commit
08b02a6
β€’
1 Parent(s): a2c0203

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: nl
3
+ tags:
4
+ - token-classification
5
+ - sequence-tagger-model
6
+ ---
7
+
8
+ # Goal
9
+ This model can be used to add emoji to an input text.
10
+
11
+ To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity.
12
+
13
+ The accompanying demo, which includes all the pre- and postprocessing needed can be found [here](https://huggingface.co/spaces/ml6team/emoji_predictor).
14
+
15
+ For the moment, this only works for Dutch texts.
16
+
17
+
18
+
19
+ # Dataset
20
+ For this model, we scraped about 1000 unique tweets per emoji we support:
21
+ ['😨', 'πŸ˜₯', '😍', '😠', '🀯', 'πŸ˜„', '🍾', 'πŸš—', 'β˜•', 'πŸ’°']
22
+
23
+ Which could look like this:
24
+ ```
25
+ Wow 😍😍, what a cool car πŸš—πŸš—!
26
+ Omg, I hate mondays 😠... I need a drink 🍾
27
+ ```
28
+
29
+ After some processing, we can reposition this in a more known NER format:
30
+
31
+
32
+ | Word | Label |
33
+ |-------|-----|
34
+ | Wow | B-😍|
35
+ | , | O |
36
+ | what | O |
37
+ | a | O |
38
+ | cool | O |
39
+ | car | O |
40
+ | ! | B-πŸš—|
41
+
42
+ Which can then be leveraged for training a token classification model.
43
+
44
+ Unfortunately, Terms of Service prohibit us from sharing the original dataset.
45
+
46
+
47
+
48
+ # Training
49
+
50
+ The model was trained for 4 epochs.