Shoonya / quantization_note.md
MandarapuMadhulatha's picture
Upload Shoonya Model v0.2 with DeepSeek CPU optimizations
8493c0e verified

Note on Quantization

The quantized version of this model is not included because PyTorch quantization has limited support on Mac M-series chips.

To quantize this model on a compatible system:

import torch
from model.transformer import TransformerLM, ModelConfig

# Load the model
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
config = # Load your config

# Create model instance
model = TransformerLM(config)
model.load_state_dict(checkpoint)
model.eval()

# Apply dynamic quantization to linear layers
quantized_model = torch.quantization.quantize_dynamic(
    model,
    {torch.nn.Linear},
    dtype=torch.qint8
)

# Save quantized model
torch.save(quantized_model.state_dict(), "pytorch_model_quantized.bin")