Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

LLM Model Serving

Model Performance Optimization

  • Model Quantization
  • Model Prunning
  • Machine Learning Compilation (MLC-LLM)
  • Neural Magic (DeepSparse)

Model Serving on Bare-metal Server

Docker Container

Model Serving on Kubernernetes Cluster

Inference Servers

  • TorchServe
  • TGI
  • Triton
  • Seldon Core
  • vLLM(CPU/GPU)
  • LLamaCPP
  • Ollama
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Collection including ArunKr/LLM-Model-Serving