Update model card
Browse files
README.md
CHANGED
@@ -14,7 +14,13 @@ Pretrained CANINE model on English language using a masked language modeling (ML
|
|
14 |
|
15 |
What's special about CANINE is that it doesn't require an explicit tokenizer (such as WordPiece or SentencePiece) as other models like BERT and RoBERTa. Instead, it directly operates at a character level: each character is turned into its [Unicode code point](https://en.wikipedia.org/wiki/Code_point#:~:text=For%20Unicode%2C%20the%20particular%20sequence,forming%20a%20self%2Dsynchronizing%20code.).
|
16 |
|
17 |
-
This means that input processing is trivial and can typically be accomplished as:
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
Disclaimer: The team releasing CANINE did not write a model card for this model so this model card has been written by the Hugging Face team.
|
20 |
|
|
|
14 |
|
15 |
What's special about CANINE is that it doesn't require an explicit tokenizer (such as WordPiece or SentencePiece) as other models like BERT and RoBERTa. Instead, it directly operates at a character level: each character is turned into its [Unicode code point](https://en.wikipedia.org/wiki/Code_point#:~:text=For%20Unicode%2C%20the%20particular%20sequence,forming%20a%20self%2Dsynchronizing%20code.).
|
16 |
|
17 |
+
This means that input processing is trivial and can typically be accomplished as:
|
18 |
+
|
19 |
+
```
|
20 |
+
input_ids = [ord(char) for char in text]
|
21 |
+
```
|
22 |
+
|
23 |
+
The ord() function is part of Python, and turns each character into its Unicode code point.
|
24 |
|
25 |
Disclaimer: The team releasing CANINE did not write a model card for this model so this model card has been written by the Hugging Face team.
|
26 |
|