Tokenizer Config File

by rushdishams - opened

I am using finbert-tone from SageMaker notebooks to get text sentiments. The text are large in size. I am doing a batch transform, so I have "parameters":{"truncate":true} for each line inside json objects. But I am getting "Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation." I found that the project has no tokenizer config file. How can I use the model with large texts without having a preprocess step to limit the token numbers of the texts? Thank you.

This comment has been hidden

Sign up or log in to comment