Instructions to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "PranjalZetsu/Fab_Yield_Agent_Qwen-q4") - Transformers
How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PranjalZetsu/Fab_Yield_Agent_Qwen-q4") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("PranjalZetsu/Fab_Yield_Agent_Qwen-q4", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "PranjalZetsu/Fab_Yield_Agent_Qwen-q4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PranjalZetsu/Fab_Yield_Agent_Qwen-q4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/PranjalZetsu/Fab_Yield_Agent_Qwen-q4
- SGLang
How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PranjalZetsu/Fab_Yield_Agent_Qwen-q4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PranjalZetsu/Fab_Yield_Agent_Qwen-q4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "PranjalZetsu/Fab_Yield_Agent_Qwen-q4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PranjalZetsu/Fab_Yield_Agent_Qwen-q4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for PranjalZetsu/Fab_Yield_Agent_Qwen-q4 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for PranjalZetsu/Fab_Yield_Agent_Qwen-q4 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for PranjalZetsu/Fab_Yield_Agent_Qwen-q4 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="PranjalZetsu/Fab_Yield_Agent_Qwen-q4", max_seq_length=2048, ) - Docker Model Runner
How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with Docker Model Runner:
docker model run hf.co/PranjalZetsu/Fab_Yield_Agent_Qwen-q4
Model Details
Developed by: PranjalZetsu
Model type: Causal Language Model fine-tuned via RL (GRPO)
Language(s): English
License: Apache 2.0
Finetuned from model: Qwen/Qwen2.5-7B-Instruct
Property Value Base Model Qwen/Qwen2.5-7B Adapter Type LoRA (PEFT) Quantization 4-bit (bitsandbytes, NF4) Training Framework Unsloth + TRL (GRPO) PEFT Version 0.19.1 Task Text Generation / Agentic Reasoning Domain Semiconductor Fabrication, Yield Engineering
Model Description
Fab_Yield_Agent_Qwen-q4 is a domain-specialized language model fine-tuned for semiconductor fabrication yield analysis and optimization. Built on the Qwen2.5-7B-Instruct architecture and trained using Reinforcement Learning from verifiable rewards (GRPO), this model learns to reason through complex statistical and materials science problems with structured, step-by-step thinking.
The Problem: Semiconductor Yield Loss
Semiconductor manufacturing is one of the most complex industrial processes on Earth. A modern chip fab runs wafers through hundreds of steps, each controlled by multiple physical parameters (temperatures, pressures, gas flows). Even tiny deviations cascade into yield loss. This agent is trained to act as a process integration engineer:
- Analyzing yield data and identifying root causes.
- Navigating the physics of a 15-parameter process space.
- Converging on optimal manufacturing recipes within a limited experiment budget.
The Key Insight: RL Emergent Intelligence
During Reinforcement Learning, the model was rewarded purely on the correctness of final answersโnot on intermediate reasoning steps. Spontaneously, it developed:
- Deeper statistical reasoning: Proactive use of Cpk, Poisson models, and control chart logic.
- Material science grounding: Reasoning through etch selectivity, deposition uniformity, and diffusion profiles.
- Structured problem decomposition: Breaking queries into logical sub-tasks before synthesizing conclusions.
Uses
Direct Use
- Fab yield triage: Rapidly analyze incoming yield data and identify likely root causes.
- Process window analysis: Evaluate margin sensitivity across interconnected process steps.
- Statistical process control (SPC): Interpret control charts and flag out-of-control signals.
- Material selection reasoning: Assess tradeoffs between material properties and process compatibility.
Out-of-Scope Use
- General purpose QA or creative writing.
- Safety-critical sign-off without human expert verification.
- Use as a replacement for certified TCAD/SPC simulation software.
Bias, Risks, and Limitations
- Not a simulation substitute: This model should not replace calibrated simulation tools like TCAD or certified SPC software.
- 4-bit quantization: While efficient, there may be minor accuracy tradeoffs compared to full-precision models.
- Domain Focus: Performance on non-semiconductor tasks is not optimized.
How to Get Started
Quick Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model = "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"
adapter = "PranjalZetsu/Fab_Yield_Agent_Qwen-q4"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, load_in_4bit=True, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
prompt = """You are a semiconductor yield engineer.
A 300mm fab sees 12% yield loss on a 7nm logic layer.
Defect density: 0.08 defects/cm2. Critical area: 150 cm2.
Perform a yield analysis and recommend corrective actions."""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Local Setup (Environment & API)
# Clone the repository
git clone https://github.com/pranjalyt/fab-yield-agent.git
cd fab-yield-agent
# Install dependencies
pip install -r requirements.txt
# Run the API server
uvicorn server:app --host 0.0.0.0 --port 7860
System Architecture
The project includes a production-grade RL environment (FabYieldEnv) and a Response Surface Model (RSM) simulator.
RSMSimulator
A hidden ground truth engine that implements a second-order Response Surface Model. It captures non-linear interactions between 15 process parameters (Temperature, Etch Time, Pressure, etc.).
- Normalization: All parameters are mapped to [-1, 1] for scale-invariant modeling.
- Interaction Terms: Simulates how variables like Pressure and Gas Flow interact to control plasma density.
- Dynamic Physics: Every episode generates a fresh set of coefficients, forcing the agent to generalize its optimization strategy.
Defect Classification
The simulator produces physically motivated defect signatures:
- Edge Ring: Non-uniform plasma at wafer edges (linked to Pressure/Gas flow).
- Center Spot: Thermal hotspot at wafer center (linked to Temp/RF power).
- Random Scatter: Chemical contamination (linked to Dopant levels).
Senior Engineer Reviewer
A multi-agent layer that simulates the human approval gate. It enforces episode-varying qualification constraints (min yield, max variance, forbidden ranges).
Training Details
Training Pipeline
- Supervised Fine-Tuning (SFT): Initial training on semiconductor reports, SPC problem sets, and material science Q&A.
- Reinforcement Learning (GRPO): Training via Group Relative Policy Optimization, rewarding answer correctness, structured thinking traces, and numerical precision.
Reward System (Four-Component Design)
- Yield Reward (50%): Continuous signal based on yield improvement.
- Efficiency Reward (20%): Sparse reward for hitting targets within the 12-experiment budget.
- Causal Attribution (15%): Reward for correctly identifying the primary bottleneck parameter in natural language.
- Stability Reward (15%): Reward for submitted recipes that show low lot-to-lot variance.
Evaluation: Emergent Capabilities
Statistical Reasoning
| Capability | Before RL (SFT only) | After RL (GRPO) |
|---|---|---|
| Yield estimation | States a number | Derives from defect density + critical area |
| Process capability | Rarely mentions | Calculates Cpk from spec limits + sigma |
| Confidence intervals | Absent | Appears spontaneously in reasoning traces |
Materials Science Reasoning
| Capability | Before RL (SFT only) | After RL (GRPO) |
|---|---|---|
| Film uniformity | Generic description | Linked to deposition mechanism physically |
| Etch selectivity | Surface-level | Reasoned from underlying chemistry |
| Defect root cause | Names defect types | Traces cause to process physics |
Research Theme Alignment
This work addresses several frontier AI research themes for OpenEnv India 2026:
- World Modeling (Theme 3.1): Modeling the partially observable professional world of a fab with physical and statistical constraints.
- Long-Horizon Planning (Theme 2): Decomposing multi-step yield investigations with sparse, outcome-based rewards across a 12-step budget.
- Self-Improvement (Theme 4): Emergent capability growth where the model discovers complex analytical tools to maximize its reward signal.
Glossary
| Term | Definition |
|---|---|
| Wafer | A thin silicon disc on which chips are fabricated simultaneously |
| Yield | Percentage of working chips per wafer |
| RSM | Response Surface Methodology - statistical technique mapping inputs to outputs |
| DoE | Design of Experiments - systematic approach to planning experiments |
| Lot | A batch of typically 25 wafers processed together |
| CMP | Chemical-Mechanical Planarization - polishing process to flatten surfaces |
Citation
@misc{fab_yield_agent_qwen_q4,
author = {PranjalZetsu},
title = {Fab_Yield_Agent_Qwen-q4: RL-Trained Semiconductor Yield Reasoning Agent},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/PranjalZetsu/Fab_Yield_Agent_Qwen-q4}
}
Contact
For questions or feedback, open a discussion on the Hugging Face Community tab.
- Downloads last month
- 65
Model tree for PranjalZetsu/Fab_Yield_Agent_Qwen-q4
Base model
Qwen/Qwen2.5-7B