tianlecai commited on
Commit
0afd583
1 Parent(s): 80a830f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center"><img src="https://github.com/FasterDecoding/Medusa/blob/main/assets/logo.png?raw=true" alt="Medusa" width="100" align="center"></div>
2
+ <div align="center"><h1>&nbsp;Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads</h1></div>
3
+
4
+ <p align="center">
5
+ | <a href="https://sites.google.com/view/
6
+ medusa-llm"><b>Blog</b></a> | <a href="https://github.com/FasterDecoding/Medusa"><b>Codebase</b></a> |
7
+ </p>
8
+
9
+ ---
10
+
11
+ ## Installation
12
+ ### Method 1: With pip
13
+ ```bash
14
+ pip install medusa-llm
15
+ ```
16
+ ### Method 2: From source
17
+ ```bash
18
+ git clone https://github.com/FasterDecoding/Medusa.git
19
+ cd Medusa
20
+ pip install -e .
21
+ ```
22
+
23
+ ### Model Weights
24
+ | Size | Chat Command | Hugging Face Repo |
25
+ | ---- | --------------------------------------------- | --------------------------------------------------------------------- |
26
+ | 7B | `python -m medusa.inference.cli --model FasterDecoding/medusa-vicuna-7b-v1.3` | [FasterDecoding/medusa-vicuna-33b-v1.3](https://huggingface.co/FasterDecoding/medusa-vicuna-7b-v1.3) |
27
+ | 13B | `python -m medusa.inference.cli --model FasterDecoding/medusa-vicuna-13b-v1.3` | [FasterDecoding/medusa-vicuna-13b-v1.3](https://huggingface.co/FasterDecoding/medusa-vicuna-13b-v1.3) |
28
+ | 33B | `python -m medusa.inference.cli --model FasterDecoding/medusa-vicuna-33b-v1.3` | [FasterDecoding/medusa-vicuna-33b-v1.3](https://huggingface.co/FasterDecoding/medusa-vicuna-33b-v1.3) |
29
+
30
+ ### Inference
31
+ We currently support inference in the single GPU and batch size 1 setting, which is the most common setup for local model hosting. We are actively working to extend Medusa's capabilities by integrating it into other inference frameworks, please don't hesitate to reach out if you are interested in contributing to this effort.
32
+
33
+ You can use the following command for lauching a CLI interface:
34
+ ```bash
35
+ python -m medusa.inference.cli --model [path of medusa model]
36
+ ```
37
+ You can also pass `--load-in-8bit` or `--load-in-4bit` to load the base model in quantized format.