Bed - Int8 Quantized Static Embeddings for Semantic Search
Ultra-fast int8 quantized static embeddings model for semantic search. Optimized for the gobed Go library.
Model Details
Property
Value
Dimensions
512
Precision
int8 + scale factors
Vocabulary
30,522 tokens
Model Size
15 MB
Format
safetensors
Performance
Embedding latency: 0.16ms average
Throughput: 6,200+ embeddings/sec
Memory: 15 MB (7.9x smaller than float32 version)
Compression ratio: 87.4% space reduction vs original
Usage with gobed (Go)
go get github.com/lee101/gobed
package main
import (
"fmt""log""github.com/lee101/gobed"
)
funcmain() {
engine, err := gobed.NewAutoSearchEngine()
if err != nil {
log.Fatal(err)
}
defer engine.Close()
docs := map[string]string{
"doc1": "machine learning and neural networks",
"doc2": "natural language processing",
}
engine.AddDocuments(docs)
results, _, _ := engine.SearchWithMetadata("AI research", 3)
for _, r := range results {
fmt.Printf("[%.3f] %s\n", r.Similarity, r.Content)
}
}
Download Model Manually
# Clone the model repository
git clone https://huggingface.co/lee101/bed
# Or download specific files
wget https://huggingface.co/lee101/bed/resolve/main/modelint8_512dim.safetensors
wget https://huggingface.co/lee101/bed/resolve/main/tokenizer.json
Using huggingface_hub (Python)
from huggingface_hub import hf_hub_download
# Download model file
model_path = hf_hub_download(repo_id="lee101/bed", filename="modelint8_512dim.safetensors")
# Download tokenizer
tokenizer_path = hf_hub_download(repo_id="lee101/bed", filename="tokenizer.json")
Model Architecture
This model uses static embeddings with int8 quantization:
Embedding layer: 30,522 x 512 int8 weights
Scale factors: 30,522 float32 scale values (one per token)
Tokenizer: WordPiece tokenizer (same as BERT)
Embeddings are computed by:
Tokenizing input text
Looking up int8 embeddings for each token
Multiplying by scale factors to reconstruct float values
Mean pooling across tokens
Quantization Details
Original model: 30,522 x 1024 float32 (119 MB)
Quantized model: 30,522 x 512 int8 + 30,522 float32 scales (15 MB)