Tokenizers documentation

Encode Inputs

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.13.4.rc2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Encode Inputs

Python
Rust
Node

These types represent all the different kinds of input that a Tokenizer accepts when using encode_batch().

TextEncodeInput[[[ tokenizers.TextEncodeInput ]]]

tokenizers.TextEncodeInput

Represents a textual input for encoding. Can be either:

alias of Union[str, Tuple[str, str], List[str]].

PreTokenizedEncodeInput[[[ tokenizers.PreTokenizedEncodeInput ]]]

tokenizers.PreTokenizedEncodeInput

Represents a pre-tokenized input for encoding. Can be either:

alias of Union[List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]].

EncodeInput[[[ tokenizers.EncodeInput ]]]

tokenizers.EncodeInput

Represents all the possible types of input for encoding. Can be:

alias of Union[str, Tuple[str, str], List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]].