Instructions to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "PranjalZetsu/Fab_Yield_Agent_Qwen-q4")

Transformers

How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PranjalZetsu/Fab_Yield_Agent_Qwen-q4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("PranjalZetsu/Fab_Yield_Agent_Qwen-q4", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PranjalZetsu/Fab_Yield_Agent_Qwen-q4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PranjalZetsu/Fab_Yield_Agent_Qwen-q4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PranjalZetsu/Fab_Yield_Agent_Qwen-q4

SGLang

How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PranjalZetsu/Fab_Yield_Agent_Qwen-q4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PranjalZetsu/Fab_Yield_Agent_Qwen-q4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PranjalZetsu/Fab_Yield_Agent_Qwen-q4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PranjalZetsu/Fab_Yield_Agent_Qwen-q4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for PranjalZetsu/Fab_Yield_Agent_Qwen-q4 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for PranjalZetsu/Fab_Yield_Agent_Qwen-q4 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for PranjalZetsu/Fab_Yield_Agent_Qwen-q4 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="PranjalZetsu/Fab_Yield_Agent_Qwen-q4",
    max_seq_length=2048,
)

Docker Model Runner
How to use PranjalZetsu/Fab_Yield_Agent_Qwen-q4 with Docker Model Runner:
```
docker model run hf.co/PranjalZetsu/Fab_Yield_Agent_Qwen-q4
```

Model Details

Developed by: PranjalZetsu

Model type: Causal Language Model fine-tuned via RL (GRPO)
Language(s): English
License: Apache 2.0
Finetuned from model: Qwen/Qwen2.5-7B-Instruct

Property	Value
Base Model	Qwen/Qwen2.5-7B
Adapter Type	LoRA (PEFT)
Quantization	4-bit (bitsandbytes, NF4)
Training Framework	Unsloth + TRL (GRPO)
PEFT Version	0.19.1
Task	Text Generation / Agentic Reasoning
Domain	Semiconductor Fabrication, Yield Engineering

Model Description

Fab_Yield_Agent_Qwen-q4 is a domain-specialized language model fine-tuned for semiconductor fabrication yield analysis and optimization. Built on the Qwen2.5-7B-Instruct architecture and trained using Reinforcement Learning from verifiable rewards (GRPO), this model learns to reason through complex statistical and materials science problems with structured, step-by-step thinking.

The Problem: Semiconductor Yield Loss

Semiconductor manufacturing is one of the most complex industrial processes on Earth. A modern chip fab runs wafers through hundreds of steps, each controlled by multiple physical parameters (temperatures, pressures, gas flows). Even tiny deviations cascade into yield loss. This agent is trained to act as a process integration engineer:

Analyzing yield data and identifying root causes.
Navigating the physics of a 15-parameter process space.
Converging on optimal manufacturing recipes within a limited experiment budget.

The Key Insight: RL Emergent Intelligence

During Reinforcement Learning, the model was rewarded purely on the correctness of final answers—not on intermediate reasoning steps. Spontaneously, it developed:

Deeper statistical reasoning: Proactive use of Cpk, Poisson models, and control chart logic.
Material science grounding: Reasoning through etch selectivity, deposition uniformity, and diffusion profiles.
Structured problem decomposition: Breaking queries into logical sub-tasks before synthesizing conclusions.

Uses

Direct Use

Fab yield triage: Rapidly analyze incoming yield data and identify likely root causes.
Process window analysis: Evaluate margin sensitivity across interconnected process steps.
Statistical process control (SPC): Interpret control charts and flag out-of-control signals.
Material selection reasoning: Assess tradeoffs between material properties and process compatibility.

Out-of-Scope Use

General purpose QA or creative writing.
Safety-critical sign-off without human expert verification.
Use as a replacement for certified TCAD/SPC simulation software.

Bias, Risks, and Limitations

Not a simulation substitute: This model should not replace calibrated simulation tools like TCAD or certified SPC software.
4-bit quantization: While efficient, there may be minor accuracy tradeoffs compared to full-precision models.
Domain Focus: Performance on non-semiconductor tasks is not optimized.

How to Get Started

Quick Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"
adapter    = "PranjalZetsu/Fab_Yield_Agent_Qwen-q4"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, load_in_4bit=True, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

prompt = """You are a semiconductor yield engineer.
A 300mm fab sees 12% yield loss on a 7nm logic layer.
Defect density: 0.08 defects/cm2. Critical area: 150 cm2.
Perform a yield analysis and recommend corrective actions."""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Local Setup (Environment & API)

# Clone the repository
git clone https://github.com/pranjalyt/fab-yield-agent.git
cd fab-yield-agent

# Install dependencies
pip install -r requirements.txt

# Run the API server
uvicorn server:app --host 0.0.0.0 --port 7860

System Architecture

The project includes a production-grade RL environment (FabYieldEnv) and a Response Surface Model (RSM) simulator.

RSMSimulator

A hidden ground truth engine that implements a second-order Response Surface Model. It captures non-linear interactions between 15 process parameters (Temperature, Etch Time, Pressure, etc.).

Normalization: All parameters are mapped to [-1, 1] for scale-invariant modeling.
Interaction Terms: Simulates how variables like Pressure and Gas Flow interact to control plasma density.
Dynamic Physics: Every episode generates a fresh set of coefficients, forcing the agent to generalize its optimization strategy.

Defect Classification

The simulator produces physically motivated defect signatures:

Edge Ring: Non-uniform plasma at wafer edges (linked to Pressure/Gas flow).
Center Spot: Thermal hotspot at wafer center (linked to Temp/RF power).
Random Scatter: Chemical contamination (linked to Dopant levels).

Senior Engineer Reviewer

A multi-agent layer that simulates the human approval gate. It enforces episode-varying qualification constraints (min yield, max variance, forbidden ranges).

Training Details

Training Pipeline

Supervised Fine-Tuning (SFT): Initial training on semiconductor reports, SPC problem sets, and material science Q&A.
Reinforcement Learning (GRPO): Training via Group Relative Policy Optimization, rewarding answer correctness, structured thinking traces, and numerical precision.

Reward System (Four-Component Design)

Yield Reward (50%): Continuous signal based on yield improvement.
Efficiency Reward (20%): Sparse reward for hitting targets within the 12-experiment budget.
Causal Attribution (15%): Reward for correctly identifying the primary bottleneck parameter in natural language.
Stability Reward (15%): Reward for submitted recipes that show low lot-to-lot variance.

Evaluation: Emergent Capabilities

Statistical Reasoning

Capability	Before RL (SFT only)	After RL (GRPO)
Yield estimation	States a number	Derives from defect density + critical area
Process capability	Rarely mentions	Calculates Cpk from spec limits + sigma
Confidence intervals	Absent	Appears spontaneously in reasoning traces

Materials Science Reasoning

Capability	Before RL (SFT only)	After RL (GRPO)
Film uniformity	Generic description	Linked to deposition mechanism physically
Etch selectivity	Surface-level	Reasoned from underlying chemistry
Defect root cause	Names defect types	Traces cause to process physics

Research Theme Alignment

This work addresses several frontier AI research themes for OpenEnv India 2026:

World Modeling (Theme 3.1): Modeling the partially observable professional world of a fab with physical and statistical constraints.
Long-Horizon Planning (Theme 2): Decomposing multi-step yield investigations with sparse, outcome-based rewards across a 12-step budget.
Self-Improvement (Theme 4): Emergent capability growth where the model discovers complex analytical tools to maximize its reward signal.

Glossary

Term	Definition
Wafer	A thin silicon disc on which chips are fabricated simultaneously
Yield	Percentage of working chips per wafer
RSM	Response Surface Methodology - statistical technique mapping inputs to outputs
DoE	Design of Experiments - systematic approach to planning experiments
Lot	A batch of typically 25 wafers processed together
CMP	Chemical-Mechanical Planarization - polishing process to flatten surfaces

Citation

@misc{fab_yield_agent_qwen_q4,
  author    = {PranjalZetsu},
  title     = {Fab_Yield_Agent_Qwen-q4: RL-Trained Semiconductor Yield Reasoning Agent},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/PranjalZetsu/Fab_Yield_Agent_Qwen-q4}
}

Contact

For questions or feedback, open a discussion on the Hugging Face Community tab.

Downloads last month: 65

Model tree for PranjalZetsu/Fab_Yield_Agent_Qwen-q4

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Quantized

unsloth/Qwen2.5-7B-Instruct-bnb-4bit

Adapter

(47)

this model

PranjalZetsu
/

Fab_Yield_Agent_Qwen-q4