File size: 1,364 Bytes
9fa755c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
---
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
---
# BGE-M3 in HuggingFace Transformer
> **This is not an official implementation of BGE-M3. Official implementation can be found in [Flag Embedding](https://github.com/FlagOpen/FlagEmbedding) project.**
## Introduction
Full introduction please see the github repo.
https://github.com/liuyanyi/transformers-bge-m3
## Use BGE-M3 in HuggingFace Transformer
```python
from transformers import AutoModel, AutoTokenizer
# Trust remote code is required to load the model
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
input_str = "Hello, world!"
input_ids = tokenizer(input_str, return_tensors="pt", padding=True, truncation=True)
output = model(**input_ids, return_dict=True)
dense_output = output.dense_output # To align with Flag Embedding project, a normalization is required
colbert_output = output.colbert_output # To align with Flag Embedding project, a normalization is required
sparse_output = output.sparse_output
```
## References
- [Official BGE-M3 Weight](https://huggingface.co/BAAI/bge-m3)
- [Flag Embedding](https://github.com/FlagOpen/FlagEmbedding)
- [HuggingFace Transformer](https://github.com/huggingface/transformers) |