Alovestocode commited on
Commit
335d4f5
·
verified ·
1 Parent(s): 34fbe37

Use merged Qwen checkpoint by default

Browse files
Files changed (3) hide show
  1. README.md +6 -33
  2. __pycache__/app.cpython-313.pyc +0 -0
  3. app.py +1 -1
README.md CHANGED
@@ -20,56 +20,29 @@ endpoint via the `HF_ROUTER_API` environment variable.
20
 
21
  | File | Purpose |
22
  | ---- | ------- |
23
- | `app.py` | Loads the merged checkpoint on demand, exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
24
  | `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
25
  | `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
26
 
27
  ## Deployment Steps
28
 
29
- 1. **Merge and upload the router adapter**
30
- ```python
31
- from peft import PeftModel
32
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
33
- import torch
34
-
35
- BASE = "Qwen/Qwen3-32B"
36
- ADAPTER = "CourseGPT-Pro-DSAI-Lab-Group-6/router-qwen3-32b-peft"
37
-
38
- quant_cfg = BitsAndBytesConfig(load_in_4bit=True,
39
- bnb_4bit_compute_dtype=torch.bfloat16)
40
-
41
- tok = AutoTokenizer.from_pretrained(BASE, use_fast=False)
42
- base = AutoModelForCausalLM.from_pretrained(
43
- BASE,
44
- quantization_config=quant_cfg,
45
- device_map="auto",
46
- trust_remote_code=True,
47
- )
48
-
49
- merged = PeftModel.from_pretrained(base, ADAPTER).merge_and_unload()
50
- save_dir = "router-qwen3-32b-4bit"
51
- merged.save_pretrained(save_dir)
52
- tok.save_pretrained(save_dir)
53
- ```
54
- Upload `router-qwen3-32b-4bit/` to a new model repo (e.g. `Alovestocode/router-qwen3-32b-4bit`).
55
-
56
- 2. **Create the Space**
57
  ```bash
58
  huggingface-cli repo create router-router-zero \
59
  --type space --sdk gradio --hardware zerogpu --yes
60
  ```
61
 
62
- 3. **Publish the code**
63
  ```bash
64
  cd Milestone-6/router-agent/zero-gpu-space
65
  huggingface-cli upload . Alovestocode/router-router-zero --repo-type space
66
  ```
67
 
68
- 4. **Configure secrets**
69
- - `MODEL_REPO` – defaults to `Alovestocode/router-qwen3-32b-4bit`
70
  - `HF_TOKEN` – token with read access to the merged model
71
 
72
- 5. **Connect the main router UI**
73
  ```bash
74
  export HF_ROUTER_API=https://Alovestocode-router-router-zero.hf.space/v1/generate
75
  ```
 
20
 
21
  | File | Purpose |
22
  | ---- | ------- |
23
+ | `app.py` | Loads the merged checkpoint on demand (defaults to `Alovestocode/router-qwen3-32b-merged`), exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
24
  | `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
25
  | `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
26
 
27
  ## Deployment Steps
28
 
29
+ 1. **Create the Space**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ```bash
31
  huggingface-cli repo create router-router-zero \
32
  --type space --sdk gradio --hardware zerogpu --yes
33
  ```
34
 
35
+ 2. **Publish the code**
36
  ```bash
37
  cd Milestone-6/router-agent/zero-gpu-space
38
  huggingface-cli upload . Alovestocode/router-router-zero --repo-type space
39
  ```
40
 
41
+ 3. **Configure secrets**
42
+ - `MODEL_REPO` – defaults to `Alovestocode/router-qwen3-32b-merged`
43
  - `HF_TOKEN` – token with read access to the merged model
44
 
45
+ 4. **Connect the main router UI**
46
  ```bash
47
  export HF_ROUTER_API=https://Alovestocode-router-router-zero.hf.space/v1/generate
48
  ```
__pycache__/app.cpython-313.pyc CHANGED
Binary files a/__pycache__/app.cpython-313.pyc and b/__pycache__/app.cpython-313.pyc differ
 
app.py CHANGED
@@ -26,7 +26,7 @@ except Exception: # pragma: no cover
26
  load_dotenv()
27
 
28
 
29
- MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-qwen3-32b-4bit")
30
  MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
31
  DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
32
  DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))
 
26
  load_dotenv()
27
 
28
 
29
+ MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-qwen3-32b-merged")
30
  MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
31
  DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
32
  DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))