Tokenizers documentation

Encode Inputs

You are viewing v0.13.2 version. A newer version v0.13.4.rc2 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Encode Inputs

Python
Rust
Node

These types represent all the different kinds of input that a Tokenizer accepts when using encode_batch().

TextEncodeInput

tokenizers.TextEncodeInput

Represents a textual input for encoding. Can be either:

alias of Union[str, Tuple[str, str], List[str]].

PreTokenizedEncodeInput

tokenizers.PreTokenizedEncodeInput

Represents a pre-tokenized input for encoding. Can be either:

alias of Union[List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]].

EncodeInput

tokenizers.EncodeInput

Represents all the possible types of input for encoding. Can be:

alias of Union[str, Tuple[str, str], List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]].