iamlemec commited on
Commit
fc12baa
1 Parent(s): c77ac9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md CHANGED
@@ -1,3 +1,37 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ <img src="https://raw.githubusercontent.com/CompendiumLabs/compendiumlabs.ai/main/images/logo_text_crop.png" alt="Compendium Labs" style="width: 500px;">
6
+
7
+ # bge-large-zh-v1.5-gguf
8
+ Source model: https://huggingface.co/BAAI/bge-large-zh-v1.5
9
+
10
+ Quantized and unquantized embedding models in GGUF format for use with `llama.cpp`. A large benefit over `transformers` is almost guaranteed and the benefit over ONNX will vary based on the application, but this seems to provide a large speedup on CPU and a modest speedup on GPU for larger models. Due to the relatively small size of these models, quantization will not provide huge benefits, but it does generate up to a 30% speedup on CPU with minimal loss in accuracy.
11
+
12
+ <br/>
13
+
14
+ # Files Available
15
+
16
+ <div style="width: 500px; margin: 0;">
17
+
18
+ | Filename | Quantization | Size |
19
+ |:-------- | ------------ | ---- |
20
+ | [bge-large-zh-v1.5-f32.gguf](https://huggingface.co/CompendiumLabs/bge-large-zh-v1.5-gguf/blob/main/bge-large-zh-v1.5-f32.gguf) | F32 | 1.3 BB |
21
+ | [bge-large-zh-v1.5-f16.gguf](https://huggingface.co/CompendiumLabs/bge-large-zh-v1.5-gguf/blob/main/bge-large-zh-v1.5-f16.gguf) | F16 | 620 MB |
22
+ | [bge-large-zh-v1.5-q8_0.gguf](https://huggingface.co/CompendiumLabs/bge-large-zh-v1.5-gguf/blob/main/bge-large-zh-v1.5-q8_0.gguf) | Q8_0 | 332 MB |
23
+ | [bge-large-zh-v1.5-q4_k_m.gguf](https://huggingface.co/CompendiumLabs/bge-large-zh-v1.5-gguf/blob/main/bge-large-zh-v1.5-q4_k_m.gguf) | Q4_K_M | 193 MB |
24
+
25
+ </div>
26
+
27
+ <br/>
28
+
29
+ # Usage
30
+
31
+ These model files can be used with pure `llama.cpp` or with the `llama-cpp-python` Python bindings
32
+ ```python
33
+ from llama_cpp import Llama
34
+ model = Llama(gguf_path, embedding=True)
35
+ embed = model.embed(texts)
36
+ ```
37
+ Here `texts` can either be a string or a list of strings, and the return value is a list of embedding vectors. The inputs are grouped into batches automatically for efficient execution. There is also LangChain integration through `langchain_community.embeddings.LlamaCppEmbeddings`.