Spaces:
Running
on
Zero
Running
on
Zero
Use merged Qwen checkpoint by default
Browse files- README.md +6 -33
- __pycache__/app.cpython-313.pyc +0 -0
- app.py +1 -1
README.md
CHANGED
|
@@ -20,56 +20,29 @@ endpoint via the `HF_ROUTER_API` environment variable.
|
|
| 20 |
|
| 21 |
| File | Purpose |
|
| 22 |
| ---- | ------- |
|
| 23 |
-
| `app.py` | Loads the merged checkpoint on demand, exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
|
| 24 |
| `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
|
| 25 |
| `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
|
| 26 |
|
| 27 |
## Deployment Steps
|
| 28 |
|
| 29 |
-
1. **
|
| 30 |
-
```python
|
| 31 |
-
from peft import PeftModel
|
| 32 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
| 33 |
-
import torch
|
| 34 |
-
|
| 35 |
-
BASE = "Qwen/Qwen3-32B"
|
| 36 |
-
ADAPTER = "CourseGPT-Pro-DSAI-Lab-Group-6/router-qwen3-32b-peft"
|
| 37 |
-
|
| 38 |
-
quant_cfg = BitsAndBytesConfig(load_in_4bit=True,
|
| 39 |
-
bnb_4bit_compute_dtype=torch.bfloat16)
|
| 40 |
-
|
| 41 |
-
tok = AutoTokenizer.from_pretrained(BASE, use_fast=False)
|
| 42 |
-
base = AutoModelForCausalLM.from_pretrained(
|
| 43 |
-
BASE,
|
| 44 |
-
quantization_config=quant_cfg,
|
| 45 |
-
device_map="auto",
|
| 46 |
-
trust_remote_code=True,
|
| 47 |
-
)
|
| 48 |
-
|
| 49 |
-
merged = PeftModel.from_pretrained(base, ADAPTER).merge_and_unload()
|
| 50 |
-
save_dir = "router-qwen3-32b-4bit"
|
| 51 |
-
merged.save_pretrained(save_dir)
|
| 52 |
-
tok.save_pretrained(save_dir)
|
| 53 |
-
```
|
| 54 |
-
Upload `router-qwen3-32b-4bit/` to a new model repo (e.g. `Alovestocode/router-qwen3-32b-4bit`).
|
| 55 |
-
|
| 56 |
-
2. **Create the Space**
|
| 57 |
```bash
|
| 58 |
huggingface-cli repo create router-router-zero \
|
| 59 |
--type space --sdk gradio --hardware zerogpu --yes
|
| 60 |
```
|
| 61 |
|
| 62 |
-
|
| 63 |
```bash
|
| 64 |
cd Milestone-6/router-agent/zero-gpu-space
|
| 65 |
huggingface-cli upload . Alovestocode/router-router-zero --repo-type space
|
| 66 |
```
|
| 67 |
|
| 68 |
-
|
| 69 |
-
- `MODEL_REPO` – defaults to `Alovestocode/router-qwen3-32b-
|
| 70 |
- `HF_TOKEN` – token with read access to the merged model
|
| 71 |
|
| 72 |
-
|
| 73 |
```bash
|
| 74 |
export HF_ROUTER_API=https://Alovestocode-router-router-zero.hf.space/v1/generate
|
| 75 |
```
|
|
|
|
| 20 |
|
| 21 |
| File | Purpose |
|
| 22 |
| ---- | ------- |
|
| 23 |
+
| `app.py` | Loads the merged checkpoint on demand (defaults to `Alovestocode/router-qwen3-32b-merged`), exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
|
| 24 |
| `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
|
| 25 |
| `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
|
| 26 |
|
| 27 |
## Deployment Steps
|
| 28 |
|
| 29 |
+
1. **Create the Space**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
```bash
|
| 31 |
huggingface-cli repo create router-router-zero \
|
| 32 |
--type space --sdk gradio --hardware zerogpu --yes
|
| 33 |
```
|
| 34 |
|
| 35 |
+
2. **Publish the code**
|
| 36 |
```bash
|
| 37 |
cd Milestone-6/router-agent/zero-gpu-space
|
| 38 |
huggingface-cli upload . Alovestocode/router-router-zero --repo-type space
|
| 39 |
```
|
| 40 |
|
| 41 |
+
3. **Configure secrets**
|
| 42 |
+
- `MODEL_REPO` – defaults to `Alovestocode/router-qwen3-32b-merged`
|
| 43 |
- `HF_TOKEN` – token with read access to the merged model
|
| 44 |
|
| 45 |
+
4. **Connect the main router UI**
|
| 46 |
```bash
|
| 47 |
export HF_ROUTER_API=https://Alovestocode-router-router-zero.hf.space/v1/generate
|
| 48 |
```
|
__pycache__/app.cpython-313.pyc
CHANGED
|
Binary files a/__pycache__/app.cpython-313.pyc and b/__pycache__/app.cpython-313.pyc differ
|
|
|
app.py
CHANGED
|
@@ -26,7 +26,7 @@ except Exception: # pragma: no cover
|
|
| 26 |
load_dotenv()
|
| 27 |
|
| 28 |
|
| 29 |
-
MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-qwen3-32b-
|
| 30 |
MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
|
| 31 |
DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
|
| 32 |
DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))
|
|
|
|
| 26 |
load_dotenv()
|
| 27 |
|
| 28 |
|
| 29 |
+
MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-qwen3-32b-merged")
|
| 30 |
MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
|
| 31 |
DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
|
| 32 |
DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))
|