k8si commited on
Commit
32722ab
·
verified ·
1 Parent(s): 712d65a

Upload 2 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ ggml-sfr-embedding-mistral-f16.llamafile filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,78 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: cc-by-nc-4.0
5
+ pipeline_tag: feature-extraction
6
+ tags:
7
+ - llamafile
8
+ library_name: llamafile
9
+ base_model:
10
+ - Salesforce/SFR-Embedding-Mistral
11
+ - dranger003/SFR-Embedding-Mistral-GGUF
12
+ model_creator: Salesforce
13
+ quantized_by: dranger003
14
+ ---
15
+ # SFR-Embedding-Mistral - llamafile
16
+
17
+ This repository contains executable weights (which we call [llamafiles](https://github.com/Mozilla-Ocho/llamafile)) that run on Linux, MacOS, Windows, FreeBSD, OpenBSD, and NetBSD for AMD64 and ARM64.
18
+
19
+ - Model creator: [Salesforce](https://huggingface.co/Salesforce)
20
+ - Original model: [Salesforce/SFR-Embedding-Mistral](https://huggingface.co/Salesforce/SFR-Embedding-Mistral)
21
+ - GGUF weights: [dranger003/SFR-Embedding-Mistral-GGUF](https://huggingface.co/dranger003/SFR-Embedding-Mistral-GGUF)
22
+ - Built with [llamafile 0.8.4](https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.4)
23
+
24
+ ## Quickstart
25
+
26
+ Running the following on a desktop OS will launch a server on `http://localhost:8080` to which you can send HTTP requests to in order to get embeddings:
27
+
28
+ ```
29
+ chmod +x ggml-sfr-embedding-mistral-f16.llamafile
30
+ ./ggml-sfr-embedding-mistral-f16.llamafile --server --nobrowser --embedding
31
+ ```
32
+
33
+ Then, you can use your favorite HTTP client to call the server's `/embedding` endpoint:
34
+
35
+ ```
36
+ curl \
37
+ -X POST \
38
+ -H "Content-Type: application/json" \
39
+ -d '{"text": "Hello, world!"}' \
40
+ http://localhost:8080/embedding
41
+ ```
42
+
43
+ For further information, please see the [llamafile README](https://github.com/mozilla-ocho/llamafile/) and the [llamafile server docs](https://github.com/Mozilla-Ocho/llamafile/blob/main/llama.cpp/server/README.md).
44
+
45
+ Having **trouble?** See the ["Gotchas" section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas) of the README or contact us on [Discord](https://discord.com/channels/1089876418936180786/1182689832057716778).
46
+
47
+ ## About llamafile
48
+
49
+ llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
50
+ It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp
51
+ binaries that run on the stock installs of six OSes for both ARM64 and
52
+ AMD64.
53
+
54
+ ## About Quantization Formats
55
+
56
+ Your choice of quantization format depends on three things:
57
+
58
+ 1. Will it fit in RAM or VRAM?
59
+ 2. Is your use case reading (e.g. summarization) or writing (e.g. chatbot)?
60
+ 3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas))
61
+
62
+ Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
63
+ generation is bounded by memory speed, so smaller quants help, but they
64
+ also cause the LLM to hallucinate more.
65
+
66
+ Good quants for reading (prompt eval speed) are BF16, F16, Q4\_0, and
67
+ Q8\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
68
+ computation speed (flops) so simpler quants help.
69
+
70
+ Note: BF16 is currently only supported on CPU.
71
+
72
+ See also: https://huggingface.co/docs/hub/en/gguf#quantization-types
73
+
74
+ ---
75
+
76
+ # Model Card
77
+
78
+ See [Salesforce/SFR-Embedding-Mistral](https://huggingface.co/Salesforce/SFR-Embedding-Mistral)
ggml-sfr-embedding-mistral-f16.llamafile ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2410e72750d051bff42947008776b38a1ddb44664c686a2f9f2a22ac1504ef54
3
+ size 14514075836