Commit
•
16fdda0
1
Parent(s):
90dc4d7
Update README.md
Browse files
README.md
CHANGED
@@ -4,4 +4,15 @@ MariTalk Large is a proprietary LLM that can be used through an API endpoint, wh
|
|
4 |
|
5 |
The purpose of including this tokenizer is to allow you to estimate the number of tokens in your prompts and, therefore, the cost of using the model.
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
For more information on how to use the model, please refer to our documentation at [this link](https://maritaca-ai.github.io/maritalk-api/maritalk.html).
|
|
|
4 |
|
5 |
The purpose of including this tokenizer is to allow you to estimate the number of tokens in your prompts and, therefore, the cost of using the model.
|
6 |
|
7 |
+
```python
|
8 |
+
import transformers
|
9 |
+
tokenizer = transformers.AutoTokenizer.from_pretrained("maritaca-ai/maritalk-tokenizer-large")
|
10 |
+
|
11 |
+
prompt = "Com quantos paus se faz uma canoa?"
|
12 |
+
|
13 |
+
tokens = tokenizer.encode(prompt)
|
14 |
+
|
15 |
+
print(f'O prompt "{prompt}" contém {len(tokens)} tokens.') # Deve imprimir 12 tokens.
|
16 |
+
```
|
17 |
+
|
18 |
For more information on how to use the model, please refer to our documentation at [this link](https://maritaca-ai.github.io/maritalk-api/maritalk.html).
|