Alibaba-NLP/gte-Qwen1.5-7B-instruct · Simplify usage; integrate Sentence Transformers (+ LlamaIndex/LangChain, etc.)

tomaarsen

Apr 22

•

edited Apr 22

Hello!

Pull Request overview

Simplify usage; no more batch manipulation.
Integrate with Sentence Transformers
Fix README typos

Details

This PR intends to do 2 things: simplify the usage & integrate Sentence Transformers. Let's start with the first one.

https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct/commit/1796134627b451f27efff4c126ed4b5d81f54ace updates the is_causal default to False, as that seems to be what is going to be commonly used in this model. Then we don't have to specify it manually in our batch.
https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct/commit/572ec2f075a6b03ef280bc20c0d6f9eabb99ed7e is the more extensive version of the same commit that I made for e5-mistral-7b-instruct. In essence, I add "add_eos_token": true to the tokenizer configuration. This is very simple for the LlamaTokenizer, but more complex for the Qwen2Tokenizer(Fast), as those don't accept the add_eos_token argument out of the box. So, I subclassed the Fast and normal Qwen2 tokenizers to add this parameters. I based my changes on the LlamaTokenizer which does have the option for add_eos_token. Then I added the auto_map option to the tokenizer config, and now the subclassed tokenizers will be loaded. Now, the user only needs 1 line for the tokenization:

batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt')

Note that the performance is identical: still [[70.00666809082031, 8.184867858886719], [14.62420654296875, 77.71405792236328]] with the README script.

https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct/commit/7547f852dfd0a4633a3132e567e6e6cc335ef715 then integrates with Sentence Transformers. This is quite elementary once the tokenization works out of the box. The usage is quite simple, too, and it'll work out of the box for the 3rd parties that rely on Sentence Transformers too (LangChain, LlamaIndex, etc.). I also added the web search prompt in config_sentence_transformers.json so users can use it without having to type out the whole prompt:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Alibaba-NLP/gte-Qwen1.5-7B-instruct", trust_remote_code=True)
# In case you want to reduce the maximum length:
model.max_seq_length = 8192

queries = [
    "how much protein should a female eat",
    "summit define",
]
documents = [
    "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
    "Definition of summit for English Language Learners. : 1  the highest point of a mountain : the top of a mountain. : 2  the highest level. : 3  a meeting or series of meetings between the leaders of two or more governments.",
]

query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

scores = (query_embeddings @ document_embeddings.T) * 100
print(scores.tolist())
# [[70.00668334960938, 8.184843063354492], [14.62419319152832, 77.71407318115234]]

Lastly, I fixed some typos in the README and updated flash_attention to flash_attn, as the latter is the name of the Python package.

Tom Aarsen

tomaarsen changed pull request status to open Apr 22

zyznull

Alibaba-NLP org Apr 22

Thank you for your suggestion, the modified gte-qwen1.5-7b-instruct model is now much more user-friendly!

zyznull changed pull request status to merged Apr 22