ahassoun's picture
Upload 3018 files
ee6e328
|
raw
history blame
1.54 kB

Utilities for Tokenizers

This page lists all the utility functions used by the tokenizers, mainly the class [~tokenization_utils_base.PreTrainedTokenizerBase] that implements the common methods between [PreTrainedTokenizer] and [PreTrainedTokenizerFast] and the mixin [~tokenization_utils_base.SpecialTokensMixin].

Most of those are only useful if you are studying the code of the tokenizers in the library.

PreTrainedTokenizerBase

[[autodoc]] tokenization_utils_base.PreTrainedTokenizerBase - call - all

SpecialTokensMixin

[[autodoc]] tokenization_utils_base.SpecialTokensMixin

Enums and namedtuples

[[autodoc]] tokenization_utils_base.TruncationStrategy

[[autodoc]] tokenization_utils_base.CharSpan

[[autodoc]] tokenization_utils_base.TokenSpan