nielsr HF staff commited on
Commit
6a21abb
1 Parent(s): 7886a04

Update model card

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -14,7 +14,13 @@ Pretrained CANINE model on English language using a masked language modeling (ML
14
 
15
  What's special about CANINE is that it doesn't require an explicit tokenizer (such as WordPiece or SentencePiece) as other models like BERT and RoBERTa. Instead, it directly operates at a character level: each character is turned into its [Unicode code point](https://en.wikipedia.org/wiki/Code_point#:~:text=For%20Unicode%2C%20the%20particular%20sequence,forming%20a%20self%2Dsynchronizing%20code.).
16
 
17
- This means that input processing is trivial and can typically be accomplished as: `input_ids = [ord(char) for char in text]`, using the built-in ord() function in Python.
 
 
 
 
 
 
18
 
19
  Disclaimer: The team releasing CANINE did not write a model card for this model so this model card has been written by the Hugging Face team.
20
 
 
14
 
15
  What's special about CANINE is that it doesn't require an explicit tokenizer (such as WordPiece or SentencePiece) as other models like BERT and RoBERTa. Instead, it directly operates at a character level: each character is turned into its [Unicode code point](https://en.wikipedia.org/wiki/Code_point#:~:text=For%20Unicode%2C%20the%20particular%20sequence,forming%20a%20self%2Dsynchronizing%20code.).
16
 
17
+ This means that input processing is trivial and can typically be accomplished as:
18
+
19
+ ```
20
+ input_ids = [ord(char) for char in text]
21
+ ```
22
+
23
+ The ord() function is part of Python, and turns each character into its Unicode code point.
24
 
25
  Disclaimer: The team releasing CANINE did not write a model card for this model so this model card has been written by the Hugging Face team.
26