gte-Qwen2-1.5B-instruct-GGUF
Original Model
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Run with LlamaEdge
LlamaEdge version: v0.12.2 and above
Prompt template
- Prompt type:
embedding
- Prompt type:
Context size:
32000
Run as LlamaEdge service
wasmedge --dir .:. --nn-preload default:GGML:AUTO:gte-Qwen2-1.5B-instruct-Q5_K_M.gguf \ llama-api-server.wasm \ --prompt-template embedding \ --ctx-size 32000 \ --model-name gte-Qwen2-1.5B-instruct
Quantized GGUF Models
Name | Quant method | Bits | Size | Use case |
---|---|---|---|---|
gte-Qwen2-1.5B-instruct-Q2_K.gguf | Q2_K | 2 | 752 MB | smallest, significant quality loss - not recommended for most purposes |
gte-Qwen2-1.5B-instruct-Q3_K_L.gguf | Q3_K_L | 3 | 980 MB | small, substantial quality loss |
gte-Qwen2-1.5B-instruct-Q3_K_M.gguf | Q3_K_M | 3 | 924 MB | very small, high quality loss |
gte-Qwen2-1.5B-instruct-Q3_K_S.gguf | Q3_K_S | 3 | 861 MB | very small, high quality loss |
gte-Qwen2-1.5B-instruct-Q4_0.gguf | Q4_0 | 4 | 1.07 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
gte-Qwen2-1.5B-instruct-Q4_K_M.gguf | Q4_K_M | 4 | 1.12 GB | medium, balanced quality - recommended |
gte-Qwen2-1.5B-instruct-Q4_K_S.gguf | Q4_K_S | 4 | 1.07 GB | small, greater quality loss |
gte-Qwen2-1.5B-instruct-Q5_0.gguf | Q5_0 | 5 | 1.26 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
gte-Qwen2-1.5B-instruct-Q5_K_M.gguf | Q5_K_M | 5 | 1.28 GB | large, very low quality loss - recommended |
gte-Qwen2-1.5B-instruct-Q5_K_S.gguf | Q5_K_S | 5 | 1.26 GB | large, low quality loss - recommended |
gte-Qwen2-1.5B-instruct-Q6_K.gguf | Q6_K | 6 | 1.46 GB | very large, extremely low quality loss |
gte-Qwen2-1.5B-instruct-Q8_0.gguf | Q8_0 | 8 | 1.89 GB | very large, extremely low quality loss - not recommended |
gte-Qwen2-1.5B-instruct-f16.gguf | f16 | 8 | 3.56 GB | very large, extremely low quality loss - not recommended |
Quantized with llama.cpp b3259
- Downloads last month
- 616
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for second-state/gte-Qwen2-1.5B-instruct-GGUF
Base model
Alibaba-NLP/gte-Qwen2-1.5B-instruct