Tokenizers documentation
Encode Inputs
You are viewing v0.13.0 version.
			
				A newer version
					v0.20.3 is available.
Encode Inputs
			Python
		
			Rust
		
			Node
		
These types represent all the different kinds of input that a Tokenizer accepts
when using encode_batch().
TextEncodeInput
tokenizers.TextEncodeInput
Represents a textual input for encoding. Can be either:
- A single sequence: TextInputSequence
- A pair of sequences:- A Tuple of TextInputSequence
- Or a List of TextInputSequence of size 2
 
alias of Union[str, Tuple[str, str], List[str]].
PreTokenizedEncodeInput
tokenizers.PreTokenizedEncodeInput
Represents a pre-tokenized input for encoding. Can be either:
- A single sequence: PreTokenizedInputSequence
- A pair of sequences:- A Tuple of PreTokenizedInputSequence
- Or a List of PreTokenizedInputSequence of size 2
 
alias of Union[List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]].
EncodeInput
tokenizers.EncodeInput
Represents all the possible types of input for encoding. Can be:
- When is_pretokenized=False: TextEncodeInput
- When is_pretokenized=True: PreTokenizedEncodeInput
alias of Union[str, Tuple[str, str], List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]].