Upload 2 files

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +78 -3
ggml-sfr-embedding-mistral-f16.llamafile +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+ggml-sfr-embedding-mistral-f16.llamafile filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,78 @@
----
-license: cc-by-nc-4.0
----

+---
+language:
+- en
+license: cc-by-nc-4.0
+pipeline_tag: feature-extraction
+tags:
+- llamafile
+library_name: llamafile
+base_model:
+- Salesforce/SFR-Embedding-Mistral
+- dranger003/SFR-Embedding-Mistral-GGUF
+model_creator: Salesforce
+quantized_by: dranger003
+---
+# SFR-Embedding-Mistral - llamafile
+This repository contains executable weights (which we call [llamafiles](https://github.com/Mozilla-Ocho/llamafile)) that run on Linux, MacOS, Windows, FreeBSD, OpenBSD, and NetBSD for AMD64 and ARM64.
+- Model creator: [Salesforce](https://huggingface.co/Salesforce)
+- Original model: [Salesforce/SFR-Embedding-Mistral](https://huggingface.co/Salesforce/SFR-Embedding-Mistral)
+- GGUF weights: [dranger003/SFR-Embedding-Mistral-GGUF](https://huggingface.co/dranger003/SFR-Embedding-Mistral-GGUF)
+- Built with [llamafile 0.8.4](https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.4)
+## Quickstart
+Running the following on a desktop OS will launch a server on `http://localhost:8080` to which you can send HTTP requests to in order to get embeddings:
+```
+chmod +x ggml-sfr-embedding-mistral-f16.llamafile
+./ggml-sfr-embedding-mistral-f16.llamafile --server --nobrowser --embedding
+```
+Then, you can use your favorite HTTP client to call the server's `/embedding` endpoint:
+```
+curl \
+-X POST \
+-H "Content-Type: application/json" \
+-d '{"text": "Hello, world!"}' \
+http://localhost:8080/embedding
+```
+For further information, please see the [llamafile README](https://github.com/mozilla-ocho/llamafile/) and the [llamafile server docs](https://github.com/Mozilla-Ocho/llamafile/blob/main/llama.cpp/server/README.md).
+Having **trouble?** See the ["Gotchas" section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas) of the README or contact us on [Discord](https://discord.com/channels/1089876418936180786/1182689832057716778).
+## About llamafile
+llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
+It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp
+binaries that run on the stock installs of six OSes for both ARM64 and
+AMD64.
+## About Quantization Formats
+Your choice of quantization format depends on three things:
+1. Will it fit in RAM or VRAM?
+2. Is your use case reading (e.g. summarization) or writing (e.g. chatbot)?
+3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas))
+Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
+generation is bounded by memory speed, so smaller quants help, but they
+also cause the LLM to hallucinate more.
+Good quants for reading (prompt eval speed) are BF16, F16, Q4\_0, and
+Q8\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
+computation speed (flops) so simpler quants help.
+Note: BF16 is currently only supported on CPU.
+See also: https://huggingface.co/docs/hub/en/gguf#quantization-types
+---
+# Model Card
+See [Salesforce/SFR-Embedding-Mistral](https://huggingface.co/Salesforce/SFR-Embedding-Mistral)

ggml-sfr-embedding-mistral-f16.llamafile ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2410e72750d051bff42947008776b38a1ddb44664c686a2f9f2a22ac1504ef54
+size 14514075836