Model Details

Developed by: PranjalZetsu

  • Model type: Causal Language Model fine-tuned via RL (GRPO)

  • Language(s): English

  • License: Apache 2.0

  • Finetuned from model: Qwen/Qwen2.5-7B-Instruct

  • Property Value
    Base Model Qwen/Qwen2.5-7B
    Adapter Type LoRA (PEFT)
    Quantization 4-bit (bitsandbytes, NF4)
    Training Framework Unsloth + TRL (GRPO)
    PEFT Version 0.19.1
    Task Text Generation / Agentic Reasoning
    Domain Semiconductor Fabrication, Yield Engineering

Model Description

Fab_Yield_Agent_Qwen-q4 is a domain-specialized language model fine-tuned for semiconductor fabrication yield analysis and optimization. Built on the Qwen2.5-7B-Instruct architecture and trained using Reinforcement Learning from verifiable rewards (GRPO), this model learns to reason through complex statistical and materials science problems with structured, step-by-step thinking.

The Problem: Semiconductor Yield Loss

Semiconductor manufacturing is one of the most complex industrial processes on Earth. A modern chip fab runs wafers through hundreds of steps, each controlled by multiple physical parameters (temperatures, pressures, gas flows). Even tiny deviations cascade into yield loss. This agent is trained to act as a process integration engineer:

  • Analyzing yield data and identifying root causes.
  • Navigating the physics of a 15-parameter process space.
  • Converging on optimal manufacturing recipes within a limited experiment budget.

The Key Insight: RL Emergent Intelligence

During Reinforcement Learning, the model was rewarded purely on the correctness of final answersโ€”not on intermediate reasoning steps. Spontaneously, it developed:

  • Deeper statistical reasoning: Proactive use of Cpk, Poisson models, and control chart logic.
  • Material science grounding: Reasoning through etch selectivity, deposition uniformity, and diffusion profiles.
  • Structured problem decomposition: Breaking queries into logical sub-tasks before synthesizing conclusions.

Uses

Direct Use

  • Fab yield triage: Rapidly analyze incoming yield data and identify likely root causes.
  • Process window analysis: Evaluate margin sensitivity across interconnected process steps.
  • Statistical process control (SPC): Interpret control charts and flag out-of-control signals.
  • Material selection reasoning: Assess tradeoffs between material properties and process compatibility.

Out-of-Scope Use

  • General purpose QA or creative writing.
  • Safety-critical sign-off without human expert verification.
  • Use as a replacement for certified TCAD/SPC simulation software.

Bias, Risks, and Limitations

  • Not a simulation substitute: This model should not replace calibrated simulation tools like TCAD or certified SPC software.
  • 4-bit quantization: While efficient, there may be minor accuracy tradeoffs compared to full-precision models.
  • Domain Focus: Performance on non-semiconductor tasks is not optimized.

How to Get Started

Quick Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"
adapter    = "PranjalZetsu/Fab_Yield_Agent_Qwen-q4"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, load_in_4bit=True, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

prompt = """You are a semiconductor yield engineer.
A 300mm fab sees 12% yield loss on a 7nm logic layer.
Defect density: 0.08 defects/cm2. Critical area: 150 cm2.
Perform a yield analysis and recommend corrective actions."""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Local Setup (Environment & API)

# Clone the repository
git clone https://github.com/pranjalyt/fab-yield-agent.git
cd fab-yield-agent

# Install dependencies
pip install -r requirements.txt

# Run the API server
uvicorn server:app --host 0.0.0.0 --port 7860

System Architecture

The project includes a production-grade RL environment (FabYieldEnv) and a Response Surface Model (RSM) simulator.

RSMSimulator

A hidden ground truth engine that implements a second-order Response Surface Model. It captures non-linear interactions between 15 process parameters (Temperature, Etch Time, Pressure, etc.).

  • Normalization: All parameters are mapped to [-1, 1] for scale-invariant modeling.
  • Interaction Terms: Simulates how variables like Pressure and Gas Flow interact to control plasma density.
  • Dynamic Physics: Every episode generates a fresh set of coefficients, forcing the agent to generalize its optimization strategy.

Defect Classification

The simulator produces physically motivated defect signatures:

  • Edge Ring: Non-uniform plasma at wafer edges (linked to Pressure/Gas flow).
  • Center Spot: Thermal hotspot at wafer center (linked to Temp/RF power).
  • Random Scatter: Chemical contamination (linked to Dopant levels).

Senior Engineer Reviewer

A multi-agent layer that simulates the human approval gate. It enforces episode-varying qualification constraints (min yield, max variance, forbidden ranges).


Training Details

Training Pipeline

  1. Supervised Fine-Tuning (SFT): Initial training on semiconductor reports, SPC problem sets, and material science Q&A.
  2. Reinforcement Learning (GRPO): Training via Group Relative Policy Optimization, rewarding answer correctness, structured thinking traces, and numerical precision.

Reward System (Four-Component Design)

  • Yield Reward (50%): Continuous signal based on yield improvement.
  • Efficiency Reward (20%): Sparse reward for hitting targets within the 12-experiment budget.
  • Causal Attribution (15%): Reward for correctly identifying the primary bottleneck parameter in natural language.
  • Stability Reward (15%): Reward for submitted recipes that show low lot-to-lot variance.

Evaluation: Emergent Capabilities

Statistical Reasoning

Capability Before RL (SFT only) After RL (GRPO)
Yield estimation States a number Derives from defect density + critical area
Process capability Rarely mentions Calculates Cpk from spec limits + sigma
Confidence intervals Absent Appears spontaneously in reasoning traces

Materials Science Reasoning

Capability Before RL (SFT only) After RL (GRPO)
Film uniformity Generic description Linked to deposition mechanism physically
Etch selectivity Surface-level Reasoned from underlying chemistry
Defect root cause Names defect types Traces cause to process physics

Research Theme Alignment

This work addresses several frontier AI research themes for OpenEnv India 2026:

  • World Modeling (Theme 3.1): Modeling the partially observable professional world of a fab with physical and statistical constraints.
  • Long-Horizon Planning (Theme 2): Decomposing multi-step yield investigations with sparse, outcome-based rewards across a 12-step budget.
  • Self-Improvement (Theme 4): Emergent capability growth where the model discovers complex analytical tools to maximize its reward signal.

Glossary

Term Definition
Wafer A thin silicon disc on which chips are fabricated simultaneously
Yield Percentage of working chips per wafer
RSM Response Surface Methodology - statistical technique mapping inputs to outputs
DoE Design of Experiments - systematic approach to planning experiments
Lot A batch of typically 25 wafers processed together
CMP Chemical-Mechanical Planarization - polishing process to flatten surfaces

Citation

@misc{fab_yield_agent_qwen_q4,
  author    = {PranjalZetsu},
  title     = {Fab_Yield_Agent_Qwen-q4: RL-Trained Semiconductor Yield Reasoning Agent},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/PranjalZetsu/Fab_Yield_Agent_Qwen-q4}
}

Contact

For questions or feedback, open a discussion on the Hugging Face Community tab.

Downloads last month
65
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for PranjalZetsu/Fab_Yield_Agent_Qwen-q4

Base model

Qwen/Qwen2.5-7B
Adapter
(47)
this model

Space using PranjalZetsu/Fab_Yield_Agent_Qwen-q4 1