|
This is the tokenizer used by the Sabiá-2 Medium model. |
|
|
|
Sabiá2 Medium is a proprietary LLM that can be used through an API endpoint, which we refer to as the "MariTalk API", or a downloadable version that can be used locally and is encrypted, known as "MariTalk Local". |
|
|
|
The purpose of including this tokenizer is to allow you to estimate the number of tokens in your prompts and, therefore, the cost of using the model. |
|
|
|
```python |
|
import transformers |
|
tokenizer = transformers.AutoTokenizer.from_pretrained("maritaca-ai/sabia-2-tokenizer-medium") |
|
|
|
prompt = "Com quantos paus se faz uma canoa?" |
|
|
|
tokens = tokenizer.encode(prompt) |
|
|
|
print(f'O prompt "{prompt}" contém {len(tokens)} tokens.') # It should print 12 tokens. |
|
``` |
|
|
|
For more information on how to use the model, please refer to our documentation at [this link](https://maritaca-ai.github.io/maritalk-api/maritalk.html). |