peterizsak commited on
Commit
2e43ad1
·
1 Parent(s): f393266

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -1
README.md CHANGED
@@ -2,4 +2,89 @@
2
  license: mit
3
  language:
4
  - en
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  language:
4
  - en
5
+ ---
6
+
7
+ # BGE-large-en-v1.5-rag-int8-static
8
+
9
+ A quantized version of [BAAI/BGE-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) embedder compatible with [Optimum-Intel](https://github.com/huggingface/optimum-intel) and [Intel® Neural Compressor](https://github.com/huggingface/optimum-intel).
10
+
11
+ The model can be used with [Optimum-Intel](https://github.com/huggingface/optimum-intel) API and as an embedder/ranker model as part of [fastRAG](https://github.com/IntelLabs/fastRAG).
12
+
13
+ See [model page](https://huggingface.co/BAAI/bge-large-en-v1.5) for full details on model architecture and training details.
14
+
15
+ ## Technical details
16
+
17
+ Quantized using post-training static quantization.
18
+
19
+ | | |
20
+ |---|:---:|
21
+ | Calibration set | [qasper](https://huggingface.co/datasets/allenai/qasper) (with 100 random samples)" |
22
+ | Quantization tool | [Optimum-Intel](https://github.com/huggingface/optimum-intel) |
23
+ | Backend | `IPEX` |
24
+ | Original model | [BAAI/BGE-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) |
25
+
26
+ Instructions how to reproduce the quantized model can be found [here](https://github.com/IntelLabs/fastRAG/tree/main/scripts/optimizations/embedders).
27
+
28
+ ## Evaluation - MTEB
29
+
30
+ | | `INT8` | `FP32` | % diff |
31
+ |---|:---:|:---:|:---:|
32
+ | Reranking | 0.5997 | 0.6003 | -0.108% |
33
+
34
+ ## Usage
35
+
36
+ ### Using with Optimum-intel
37
+
38
+ See [Optimum-intel](https://github.com/huggingface/optimum-intel) installation page for instructions how to install. Or run:
39
+
40
+ ``` sh
41
+ pip install -U optimum[neural-compressor] intel-extension-for-transformers
42
+ ```
43
+
44
+ Loading a model:
45
+
46
+ ``` python
47
+ from optimum.intel import INCModel
48
+
49
+ model = INCModel.from_pretrained("Intel/bge-large-en-v1.5-rag-int8-static")
50
+ ```
51
+
52
+ Running inference:
53
+
54
+ ``` python
55
+ from transformers import AutoTokenizer
56
+
57
+ tokenizer = AutoTokenizer.from_pretrained("Intel/bge-large-en-v1.5-rag-int8-static")
58
+
59
+ inputs = tokenizer(sentences, return_tensors='pt')
60
+
61
+ with torch.no_grad():
62
+ outputs = model(**inputs)
63
+ # get the vector of [CLS]
64
+ embedded = model_output[0][:, 0]
65
+ ```
66
+
67
+ ### Using with a fastRAG RAG pipeline
68
+
69
+ Get started with installing [fastRAG](https://github.com/IntelLabs/fastRAG) as instructed [here](https://github.com/IntelLabs/fastRAG).
70
+
71
+ Below is an example for loading the model into a ranker node that embeds and re-ranks all the documents it gets in the node input of a pipeline.
72
+
73
+ ``` python
74
+ from fastrag.rankers import QuantizedBiEncoderRanker
75
+
76
+ ranker = QuantizedBiEncoderRanker("Intel/bge-large-en-v1.5-rag-int8-static")
77
+ ```
78
+
79
+ and plugging it into a pipeline
80
+
81
+ ``` python
82
+
83
+ from haystack import Pipeline
84
+
85
+ p = Pipeline()
86
+ p.add_node(component=retriever, name="retriever", inputs=["Query"])
87
+ p.add_node(component=ranker, name="ranker", inputs=["retriever"])
88
+ ```
89
+
90
+ See a more complete example notebook [here](https://github.com/IntelLabs/fastRAG/blob/main/examples/optimized-embeddings.ipynb).