FP8 version made for LocateAnything
made by https://github.com/WuNein/LocateAnything-vLLM
vllm serve locate_qwen2_model --tensor-parallel-size 1 --max-model-len 8192 --gpu-memory-utilization 0.5 --kv-cache-dtype auto --max-model-len 16384 --max-num-seqs 32 --max-cudagraph-capture-size 32 --enable-prompt-embeds
- Downloads last month
- 4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support