Simplify usage; integrate Sentence Transformers (+ LlamaIndex/LangChain, etc.)
#5
by
tomaarsen
HF staff
- opened
Hello!
Pull Request overview
- Simplify usage; no more batch manipulation.
- Integrate with Sentence Transformers
- Fix README typos
Details
This PR intends to do 2 things: simplify the usage & integrate Sentence Transformers. Let's start with the first one.
- https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct/commit/1796134627b451f27efff4c126ed4b5d81f54ace updates the
is_causal
default to False, as that seems to be what is going to be commonly used in this model. Then we don't have to specify it manually in our batch. - https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct/commit/572ec2f075a6b03ef280bc20c0d6f9eabb99ed7e is the more extensive version of the same commit that I made for
e5-mistral-7b-instruct
. In essence, I add"add_eos_token": true
to the tokenizer configuration. This is very simple for the LlamaTokenizer, but more complex for the Qwen2Tokenizer(Fast), as those don't accept theadd_eos_token
argument out of the box. So, I subclassed the Fast and normal Qwen2 tokenizers to add this parameters. I based my changes on the LlamaTokenizer which does have the option foradd_eos_token
. Then I added theauto_map
option to the tokenizer config, and now the subclassed tokenizers will be loaded. Now, the user only needs 1 line for the tokenization:
batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt')
Note that the performance is identical: still [[70.00666809082031, 8.184867858886719], [14.62420654296875, 77.71405792236328]]
with the README script.
- https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct/commit/7547f852dfd0a4633a3132e567e6e6cc335ef715 then integrates with Sentence Transformers. This is quite elementary once the tokenization works out of the box. The usage is quite simple, too, and it'll work out of the box for the 3rd parties that rely on Sentence Transformers too (LangChain, LlamaIndex, etc.). I also added the web search prompt in
config_sentence_transformers.json
so users can use it without having to type out the whole prompt:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Alibaba-NLP/gte-Qwen1.5-7B-instruct", trust_remote_code=True)
# In case you want to reduce the maximum length:
model.max_seq_length = 8192
queries = [
"how much protein should a female eat",
"summit define",
]
documents = [
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.",
]
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)
scores = (query_embeddings @ document_embeddings.T) * 100
print(scores.tolist())
# [[70.00668334960938, 8.184843063354492], [14.62419319152832, 77.71407318115234]]
Lastly, I fixed some typos in the README and updated flash_attention
to flash_attn
, as the latter is the name of the Python package.
- Tom Aarsen
tomaarsen
changed pull request status to
open
Thank you for your suggestion, the modified gte-qwen1.5-7b-instruct model is now much more user-friendly!
zyznull
changed pull request status to
merged