maritaca-ai
/

sabia-2-tokenizer-medium

Model card Files Files and versions Community

rodrigo-nogueira commited on Feb 3

Commit

16fdda0

•

1 Parent(s): 90dc4d7

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -4,4 +4,15 @@ MariTalk Large is a proprietary LLM that can be used through an API endpoint, wh
 The purpose of including this tokenizer is to allow you to estimate the number of tokens in your prompts and, therefore, the cost of using the model.
 For more information on how to use the model, please refer to our documentation at [this link](https://maritaca-ai.github.io/maritalk-api/maritalk.html).

 The purpose of including this tokenizer is to allow you to estimate the number of tokens in your prompts and, therefore, the cost of using the model.
+```python
+import transformers
+tokenizer = transformers.AutoTokenizer.from_pretrained("maritaca-ai/maritalk-tokenizer-large")
+prompt = "Com quantos paus se faz uma canoa?"
+tokens = tokenizer.encode(prompt)
+print(f'O prompt "{prompt}" contém {len(tokens)} tokens.')  # Deve imprimir 12 tokens.
+```
 For more information on how to use the model, please refer to our documentation at [this link](https://maritaca-ai.github.io/maritalk-api/maritalk.html).