Sanket Rai's picture
5 10

Sanket Rai

sanketrai
Β·

AI & ML interests

NLP, CV, RL, Deep Learning, Gen AI, , MLOps

Recent Activity

reacted to macadeliccc's post with πŸ”₯ 5 days ago
Save money on your compute bill by using LMCache to share prefix KV between 2 different vllm instances. By deploying LMCache backend along with your vLLM containers, you can share a prefix KV Cache between 2 different containers and models. It is very simple to implement into your existing stack. Step 1: Pull docker images ``` docker pull apostacyh/vllm:lmcache-0.1.0 ``` Step 2: Start vLLM + LMCache ``` model=mistralai/Mistral-7B-Instruct-v0.2 # Replace with your model name sudo docker run --runtime nvidia --gpus '"device=0"' \ -v <Huggingface cache dir on your local machine>:/root/.cache/huggingface \ -p 8000:8000 \ --env "HF_TOKEN=<Your huggingface access token>" \ --ipc=host \ --network=host \ apostacyh/vllm:lmcache-0.1.0 \ --model $model --gpu-memory-utilization 0.6 --port 8000 \ --lmcache-config-file /lmcache/LMCache/examples/example-local.yaml ``` You can add another vLLM instance as long as its on a separate GPU by simply deploying another: ``` # The second vLLM instance listens at port 8001 model=mistralai/Mistral-7B-Instruct-v0.2 # Replace with your model name sudo docker run --runtime nvidia --gpus '"device=1"' \ -v <Huggingface cache dir on your local machine>:/root/.cache/huggingface \ -p 8001:8001 \ --env "HF_TOKEN=<Your huggingface token>" \ --ipc=host \ --network=host \ apostacyh/vllm:lmcache-0.1.0 \ --model $model --gpu-memory-utilization 0.7 --port 8001 \ --lmcache-config-file /lmcache/LMCache/examples/example.yaml ``` This method supports local, remote or hybrid backends so whichever vLLM deployment method you are already using should work with the LMCache container (excluding BentoML). LMCache: https://github.com/LMCache/LMCache/tree/dev vLLM: https://github.com/vllm-project/vllm
updated a model about 2 months ago
sanketrai/modernbert-base-wnut17-english-ner
updated a model about 2 months ago
sanketrai/modernbert-base-conll2003-english-ner
View all activity

Organizations

Nutanix's profile picture