Translation
GGUF
Japanese
English
imatrix
conversational

Sugoi-32B-Ultra-GGUF (IQ3_M)

Overview

This is an IQ3_M quantized GGUF version of the Sugoi-32B-Ultra translation model.

This specific quantization was calibrated using a custom Importance Matrix (imatrix) generated from a high-quality Japanese-to-English translation dataset.

It has been strictly optimized with a targeted chunk size (-c 2048) to perfectly preserve the attention weights required for a 100-line rolling conversational buffer.

This makes it exceptionally stable for translating continuous media (Visual Novels, Light Novels, and Subtitles) where maintaining character voice, tone, and pronoun consistency over long scenes is critical.

Hardware Requirements

  • VRAM: ~14.5 GB peak usage. Fits comfortably on 16GB GPUs (e.g., RTX 4080, RX 7800 XT).
  • RAM: 16GB+ System RAM recommended for context offloading.
  • Context Window: 4096 (Up to 150 lines of history) or 8192 (Up to 300 lines).

Usage

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Downloads last month
117
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DharkNet3/Sugoi-32B-Ultra-IQ3_M-GGUF

Base model

Qwen/Qwen2.5-32B
Quantized
(1)
this model

Datasets used to train DharkNet3/Sugoi-32B-Ultra-IQ3_M-GGUF