|
This is the tokenizer used by the MariTalk Large model. |
|
|
|
MariTalk Large is a proprietary LLM that can be used through an API endpoint, which we refer to as the "MariTalk API", or a downloadable version that can be used locally and is encrypted, known as "MariTalk Local". |
|
|
|
The purpose of including this tokenizer is to allow you to estimate the number of tokens in your prompts and, therefore, the cost of using the model. |
|
|
|
```python |
|
import transformers |
|
tokenizer = transformers.AutoTokenizer.from_pretrained("maritaca-ai/maritalk-tokenizer-large") |
|
|
|
prompt = "Com quantos paus se faz uma canoa?" |
|
|
|
tokens = tokenizer.encode(prompt) |
|
|
|
print(f'O prompt "{prompt}" contém {len(tokens)} tokens.') # Deve imprimir 12 tokens. |
|
``` |
|
|
|
For more information on how to use the model, please refer to our documentation at [this link](https://maritaca-ai.github.io/maritalk-api/maritalk.html). |