Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +78 -3

README.md CHANGED Viewed

@@ -1,3 +1,78 @@
----
-license: cc-by-nc-4.0
----

+---
+license: apache-2.0
+base_model: Qwen/Qwen3.5-0.8B
+tags:
+  - peft
+  - lora
+  - complexity-classification
+  - llm-routing
+  - query-difficulty
+  - brick
+datasets:
+  - regolo/brick-complexity-extractor
+library_name: peft
+pipeline_tag: text-classification
+language:
+  - en
+---
+# Brick Complexity Extractor
+LoRA fine-tune of [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) for query complexity classification (easy / medium / hard).
+Used in the **Brick** LLM routing system to decide which model tier should handle a query.
+## Training
+- **Base model**: Qwen3.5-0.8B
+- **Method**: LoRA (r=16, alpha=32, dropout=0.05)
+- **Dataset**: [regolo/brick-complexity-extractor](https://huggingface.co/datasets/regolo/brick-complexity-extractor) — 65K samples labeled by Qwen3.5-122B as LLM judge
+- **Epochs**: 3, **LR**: 2e-4 (cosine), **Batch**: 32
+- **Hardware**: NVIDIA H200 141GB, bf16
+## Evaluation (test set, 3841 samples)
+| Class | Precision | Recall | F1 |
+|-------|-----------|--------|----|
+| easy | 81.3% | 80.4% | 80.8% |
+| medium | 77.6% | 80.8% | 79.2% |
+| hard | 72.7% | 65.1% | 68.7% |
+| **accuracy** | | | **78.1%** |
+| **macro avg** | 77.2% | 75.4% | 76.2% |
+Average confidence: 91.7%
+## Usage
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch, torch.nn.functional as F
+base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B", torch_dtype=torch.bfloat16, trust_remote_code=True)
+model = PeftModel.from_pretrained(base, "regolo/brick-complexity-extractor").eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)
+# Classification via logit extraction
+LABELS = ["easy", "medium", "hard"]
+label_ids = {l: tokenizer.encode(l, add_special_tokens=False)[0] for l in LABELS}
+messages = [
+    {"role": "system", "content": "<system prompt from training_metadata.json>"},
+    {"role": "user", "content": "Classify: Design a lock-free concurrent skip-list with MVCC"},
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
+with torch.no_grad():
+    logits = model(**inputs).logits[0, -1, :]
+probs = F.softmax(torch.tensor([logits[label_ids[l]] for l in LABELS], dtype=torch.float32), dim=0)
+label = LABELS[probs.argmax()]
+confidence = probs.max().item()
+print(f"{label} ({confidence:.2%})")  # hard (94.12%)
+```
+## License
+Apache 2.0