Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Hydra Decoder (Codebase)


We currently support inference in the single GPU and batch size 1 setting, which is consistent with Medusa and is also the most common setup for local model hosting.

Model Weights. We uploaded three model weights for users to try.

Base Model Description Hugging Face Repo
Vicuna-7b Medusa Model (Original) FasterDecoding/medusa-vicuna-7b-v1.3
Vicuna-7b Medusa Model - 3 Head (Ours) Rango2000/medusa-3h-vicuna-7b-v1.3
Vicuna-7b Hydra Model - 3 Head - 1 Decoding Layer (Ours) shiqihe/hydra-decoder-1l-vicuna-7b-v1.3
Vicuna-7b Hydra Model - 3 Head - 2 Decoding Layer (Ours) shiqihe/hydra-decoder-2l-vicuna-7b-v1.3

Inference. You can use the following command to launch a CLI interface:

# optional
CUDA_VISIBLE_DEVICES=0
# run cli
python -m medusa.inference.cli --model [path/repo of hydra decoder]

license: mit

Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .