dumitrescustefan commited on
Commit
9718c77
1 Parent(s): b6d9c2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -21,6 +21,12 @@ outputs = model(input_ids)
21
  last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
22
  ```
23
 
 
 
 
 
 
 
24
  ### Evaluation
25
 
26
  Evaluation is performed on Universal Dependencies [Romanian RRT](https://universaldependencies.org/treebanks/ro_rrt/index.html) UPOS, XPOS and LAS, and on a NER task based on [RONEC](https://github.com/dumitrescustefan/ronec). Details, as well as more in-depth tests not shown here, are given in the dedicated [evaluation page](https://github.com/dumitrescustefan/Romanian-Transformers/tree/master/evaluation/README.md).
21
  last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
22
  ```
23
 
24
+ Remember to always sanitize your text! Replace ``s`` and ``t`` cedilla-letters to comma-letters with :
25
+ ```
26
+ text = text.replace("ţ", "ț").replace("ş", "ș").replace("Ţ", "Ț").replace("Ş", "Ș")
27
+ ```
28
+ because the model was **NOT** trained on cedilla ``s`` and ``t``s. If you don't, you will have decreased performance due to <UNK>s and increased number of tokens per word.
29
+
30
  ### Evaluation
31
 
32
  Evaluation is performed on Universal Dependencies [Romanian RRT](https://universaldependencies.org/treebanks/ro_rrt/index.html) UPOS, XPOS and LAS, and on a NER task based on [RONEC](https://github.com/dumitrescustefan/ronec). Details, as well as more in-depth tests not shown here, are given in the dedicated [evaluation page](https://github.com/dumitrescustefan/Romanian-Transformers/tree/master/evaluation/README.md).