请问用m3e-base作为embedding模型，对于文本的长度限制有没有比较推荐的设置

#20

by demonai - opened Aug 8, 2023

Discussion

demonai

Aug 8, 2023

请问用m3e-base作为embedding模型，对于文本的长度限制有没有比较推荐的设置

MokaHR

Moka HR SaSS org Aug 10, 2023

512 比较好，训练的时候就是这么截断的

06-mingming-Max

Jan 8, 2024

可是看到m3e-base
"clean_up_tokenization_spaces": true,
"cls_token": "[CLS]",
"do_lower_case": true,
"mask_token": "[MASK]",
"model_max_length": 1000000000000000019884624838656,
"pad_token": "[PAD]",
"sep_token": "[SEP]",
"strip_accents": null,
"tokenize_chinese_chars": true,
"tokenizer_class": "BertTokenizer",
"unk_token": "[UNK]"

这个不是代表基本不限制文本的长度么?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment