Spaces:

Alovestocode
/

router-router-zero

Running on Zero

App Files Files Community

Alovestocode commited on Nov 6

Commit

335d4f5

verified ·

1 Parent(s): 34fbe37

Use merged Qwen checkpoint by default

Browse files

Files changed (3) hide show

README.md +6 -33
__pycache__/app.cpython-313.pyc +0 -0
app.py +1 -1

README.md CHANGED Viewed

@@ -20,56 +20,29 @@ endpoint via the `HF_ROUTER_API` environment variable.
 | File | Purpose |
 | ---- | ------- |
-| `app.py` | Loads the merged checkpoint on demand, exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
 | `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
 | `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
 ## Deployment Steps
-1. **Merge and upload the router adapter**
-   ```python
-   from peft import PeftModel
-   from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
-   import torch
-   BASE = "Qwen/Qwen3-32B"
-   ADAPTER = "CourseGPT-Pro-DSAI-Lab-Group-6/router-qwen3-32b-peft"
-   quant_cfg = BitsAndBytesConfig(load_in_4bit=True,
-                                  bnb_4bit_compute_dtype=torch.bfloat16)
-   tok = AutoTokenizer.from_pretrained(BASE, use_fast=False)
-   base = AutoModelForCausalLM.from_pretrained(
-       BASE,
-       quantization_config=quant_cfg,
-       device_map="auto",
-       trust_remote_code=True,
-   )
-   merged = PeftModel.from_pretrained(base, ADAPTER).merge_and_unload()
-   save_dir = "router-qwen3-32b-4bit"
-   merged.save_pretrained(save_dir)
-   tok.save_pretrained(save_dir)
-   ```
-   Upload `router-qwen3-32b-4bit/` to a new model repo (e.g. `Alovestocode/router-qwen3-32b-4bit`).
-2. **Create the Space**
    ```bash
    huggingface-cli repo create router-router-zero \
      --type space --sdk gradio --hardware zerogpu --yes
    ```
-3. **Publish the code**
    ```bash
    cd Milestone-6/router-agent/zero-gpu-space
    huggingface-cli upload . Alovestocode/router-router-zero --repo-type space
    ```
-4. **Configure secrets**
-   - `MODEL_REPO` – defaults to `Alovestocode/router-qwen3-32b-4bit`
    - `HF_TOKEN` – token with read access to the merged model
-5. **Connect the main router UI**
    ```bash
    export HF_ROUTER_API=https://Alovestocode-router-router-zero.hf.space/v1/generate
    ```

 | File | Purpose |
 | ---- | ------- |
+| `app.py` | Loads the merged checkpoint on demand (defaults to `Alovestocode/router-qwen3-32b-merged`), exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
 | `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
 | `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
 ## Deployment Steps
+1. **Create the Space**
    ```bash
    huggingface-cli repo create router-router-zero \
      --type space --sdk gradio --hardware zerogpu --yes
    ```
+2. **Publish the code**
    ```bash
    cd Milestone-6/router-agent/zero-gpu-space
    huggingface-cli upload . Alovestocode/router-router-zero --repo-type space
    ```
+3. **Configure secrets**
+   - `MODEL_REPO` – defaults to `Alovestocode/router-qwen3-32b-merged`
    - `HF_TOKEN` – token with read access to the merged model
+4. **Connect the main router UI**
    ```bash
    export HF_ROUTER_API=https://Alovestocode-router-router-zero.hf.space/v1/generate
    ```

__pycache__/app.cpython-313.pyc CHANGED Viewed

Binary files a/__pycache__/app.cpython-313.pyc and b/__pycache__/app.cpython-313.pyc differ

app.py CHANGED Viewed

@@ -26,7 +26,7 @@ except Exception:  # pragma: no cover
 load_dotenv()
-MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-qwen3-32b-4bit")
 MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
 DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
 DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))

 load_dotenv()
+MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-qwen3-32b-merged")
 MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
 DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
 DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))