Alpine Embeddings
Run embedding models locally on Alpine Linux in Docker using Node.js β zero Python, zero glibc, pure WASM inference.
What is this?
A lightweight REST API that generates text embeddings using transformer models. Runs entirely in a ~100MB Docker container on Alpine Linux with no native dependencies.
| Feature | Detail |
|---|---|
| Runtime | Node.js 20 on Alpine Linux |
| Inference | ONNX Runtime (WASM backend) via @xenova/transformers |
| Model | Xenova/bge-small-en-v1.5 (384 dimensions, 32MB) |
| API | REST (Express.js) |
| Image size | ~150MB |
| Startup | ~5s (model loads from cache) |
| Latency | ~50-100ms per embedding |
Quick Start
# Clone
git clone https://huggingface.co/asusf15/alpine-embeddings
cd alpine-embeddings
# Build
docker build -t embeddings .
# Run
docker run -p 3000:3000 embeddings
Server starts on http://localhost:3000. Model loads automatically on first boot (~5s).
API
POST /embed
Generate embeddings for text.
Request:
{"text": "Hello world"}
Batch request:
{"text": ["Hello world", "Another sentence", "Third one"]}
Response:
{
"embeddings": [[0.0011, -0.0146, 0.0203, ...]],
"dims": 384,
"model": "Xenova/bge-small-en-v1.5",
"elapsed_ms": 52
}
GET /health
Health check.
{"status": "ready", "model": "Xenova/bge-small-en-v1.5"}
Usage Examples
cURL (Linux/Mac)
curl -X POST http://localhost:3000/embed \
-H "Content-Type: application/json" \
-d '{"text": "Hello world"}'
PowerShell (Windows)
Invoke-RestMethod -Uri http://localhost:3000/embed -Method Post -ContentType "application/json" -Body '{"text": "Hello world"}'
JavaScript (fetch)
const res = await fetch('http://localhost:3000/embed', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text: ['Hello', 'World'] })
});
const { embeddings } = await res.json();
console.log(embeddings[0].length); // 384
Python (requests)
import requests
r = requests.post('http://localhost:3000/embed', json={"text": "Hello world"})
embedding = r.json()["embeddings"][0] # 384-dim vector
Configuration
Set via environment variables:
| Variable | Default | Description |
|---|---|---|
PORT |
3000 |
Server port |
MODEL |
Xenova/bge-small-en-v1.5 |
HuggingFace model ID |
# Use a different model
docker run -p 3000:3000 -e MODEL=Xenova/all-MiniLM-L6-v2 embeddings
# Different port
docker run -p 8080:8080 -e PORT=8080 embeddings
Available Models
Any ONNX model from the Xenova collection works:
| Model | Dims | Size | Quality |
|---|---|---|---|
Xenova/bge-small-en-v1.5 |
384 | 32MB | β Best for English |
Xenova/all-MiniLM-L6-v2 |
384 | 22MB | Good, smallest |
Xenova/bge-base-en-v1.5 |
768 | 110MB | Higher quality |
Xenova/multilingual-e5-small |
384 | 113MB | Multilingual |
How It Works
The key challenge: onnxruntime-node (native ONNX runtime) requires glibc, but Alpine uses musl libc. The solution:
- Install
@xenova/transformers(which depends ononnxruntime-node) - Stub
onnxruntime-nodeto re-exportonnxruntime-webinstead onnxruntime-webuses WASM β runs on any OS/libc- Set
numThreads = 1(WASM workers not needed in Node.js server) - Copy WASM binaries to where transformers.js expects them
This gives you the full transformer inference pipeline (tokenizer + model) running in pure WASM on Alpine.
Project Structure
βββ Dockerfile # Alpine + Node 20, WASM stubbing
βββ package.json # @xenova/transformers + onnxruntime-web + express
βββ server.js # Express API with /embed and /health
βββ preload.js # (Optional) Pre-download model during build
βββ .dockerignore # Exclude node_modules from context
Sharing
Push to Docker Hub
docker tag embeddings yourusername/alpine-embeddings:latest
docker push yourusername/alpine-embeddings:latest
Others can run it directly
docker pull yourusername/alpine-embeddings:latest
docker run -p 3000:3000 yourusername/alpine-embeddings:latest
Share as code (this repo)
git clone https://huggingface.co/asusf15/alpine-embeddings
cd alpine-embeddings
docker build -t embeddings .
docker run -p 3000:3000 embeddings
Docker Compose
version: '3.8'
services:
embeddings:
build: .
ports:
- "3000:3000"
environment:
- MODEL=Xenova/bge-small-en-v1.5
restart: unless-stopped
docker compose up -d
Limits
- Max 128 texts per request
- Max 10MB request body
- Single-threaded WASM (~50-100ms per text)
- First request after cold start takes ~5s (model loading)
License
MIT
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern