Upload 3 files
Browse filesGeeked Out Quantizer methodology:
- QUANTIZATION_NOTES.md: Technical specs and method details
- GEEKED_OUT_INFO.md: Overview of the quantization environment
- CALIBRATION_INFO.txt: Calibration data explanation
- CALIBRATION_INFO.txt +82 -0
- GEEKED_OUT_INFO.md +151 -0
- QUANTIZATION_NOTES.md +118 -0
CALIBRATION_INFO.txt
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CALIBRATION DATA INFORMATION
|
| 2 |
+
=============================
|
| 3 |
+
|
| 4 |
+
This model was quantized using importance matrix (imatrix) generation.
|
| 5 |
+
The imatrix captures which weights in the model are most important for
|
| 6 |
+
maintaining output quality during extreme compression (2-bit quantization).
|
| 7 |
+
|
| 8 |
+
WHAT IS CALIBRATION?
|
| 9 |
+
--------------------
|
| 10 |
+
Calibration is the process of running sample inputs through the model to
|
| 11 |
+
measure which tensors (weight matrices) contribute most to the output.
|
| 12 |
+
These measurements create an "importance matrix" that guides the quantizer
|
| 13 |
+
to preserve precision where it matters most.
|
| 14 |
+
|
| 15 |
+
CALIBRATION DATA CHARACTERISTICS
|
| 16 |
+
--------------------------------
|
| 17 |
+
Good calibration data should be:
|
| 18 |
+
|
| 19 |
+
1. REPRESENTATIVE
|
| 20 |
+
- Matches the domain the model will operate in
|
| 21 |
+
- Similar vocabulary and complexity to expected inputs
|
| 22 |
+
- Reflects actual use case scenarios
|
| 23 |
+
|
| 24 |
+
2. DIVERSE
|
| 25 |
+
- Multiple topics, subjects, and writing styles
|
| 26 |
+
- Mix of common and rare tokens
|
| 27 |
+
- Varied sentence structures and lengths
|
| 28 |
+
|
| 29 |
+
3. SUFFICIENT
|
| 30 |
+
- 100-500 text chunks of typical document length
|
| 31 |
+
- More chunks = better quality (diminishing returns beyond ~500)
|
| 32 |
+
- Each chunk processed independently
|
| 33 |
+
|
| 34 |
+
4. NATURAL
|
| 35 |
+
- Real-world text (not synthetic or random)
|
| 36 |
+
- Domain-appropriate (code for code models, medical for medical models)
|
| 37 |
+
- Representative token distribution
|
| 38 |
+
|
| 39 |
+
CALIBRATION PROCESS PARAMETERS
|
| 40 |
+
------------------------------
|
| 41 |
+
Typical settings for this quantization:
|
| 42 |
+
|
| 43 |
+
Chunks Processed: 200-500 (production quality)
|
| 44 |
+
Chunk Size: Typical document/paragraph length
|
| 45 |
+
GPU Acceleration: Enabled (99 layers offloaded)
|
| 46 |
+
Thread Count: Auto-detected based on CPU
|
| 47 |
+
|
| 48 |
+
QUALITY IMPACT
|
| 49 |
+
--------------
|
| 50 |
+
The importance matrix generated from quality calibration data enables:
|
| 51 |
+
|
| 52 |
+
- 3-8% perplexity increase (vs 10-20% without imatrix)
|
| 53 |
+
- Preservation of critical weights
|
| 54 |
+
- Intelligent bit allocation per tensor
|
| 55 |
+
- 16x compression with minimal quality loss
|
| 56 |
+
|
| 57 |
+
CALIBRATION DATA SOURCES
|
| 58 |
+
------------------------
|
| 59 |
+
Common sources for high-quality calibration data:
|
| 60 |
+
|
| 61 |
+
- Wikitext-2-raw (general language models)
|
| 62 |
+
- Domain-specific corpora (medical, legal, code)
|
| 63 |
+
- The Pile subset (diverse web text)
|
| 64 |
+
- Custom curated datasets matching expected use
|
| 65 |
+
|
| 66 |
+
VERIFICATION
|
| 67 |
+
------------
|
| 68 |
+
Quantized models are tested for:
|
| 69 |
+
β Perplexity measurement vs baseline
|
| 70 |
+
β Sample inference quality
|
| 71 |
+
β Token prediction accuracy
|
| 72 |
+
β Model file integrity
|
| 73 |
+
|
| 74 |
+
NOTES
|
| 75 |
+
-----
|
| 76 |
+
- Calibration is performed once per source model
|
| 77 |
+
- Same imatrix can be reused for different target formats
|
| 78 |
+
- Domain-specific calibration yields better results
|
| 79 |
+
- GPU acceleration significantly speeds up generation
|
| 80 |
+
|
| 81 |
+
For questions about the calibration methodology used for this model,
|
| 82 |
+
please open a discussion on the model's Hugging Face page.
|
GEEKED_OUT_INFO.md
ADDED
|
@@ -0,0 +1,151 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# The Geeked Out Quantizer
|
| 2 |
+
|
| 3 |
+
## What Is It?
|
| 4 |
+
|
| 5 |
+
**The Geeked Out Quantizer** is a production-ready quantization environment built for Windows systems. It specializes in extreme model compression using importance-aware quantization techniques, particularly the IQ2_M format which achieves 16x compression with minimal quality loss.
|
| 6 |
+
|
| 7 |
+
## The Mission
|
| 8 |
+
|
| 9 |
+
Traditional model quantization forces a choice: small file size or good quality. The Geeked Out Quantizer breaks this trade-off by using **importance matrices** β statistical analysis that identifies which weights matter most, allowing intelligent bit allocation.
|
| 10 |
+
|
| 11 |
+
## Core Capabilities
|
| 12 |
+
|
| 13 |
+
### π― Importance-Aware Quantization
|
| 14 |
+
- Generates importance matrices automatically using calibration data
|
| 15 |
+
- Allocates precision where it matters most
|
| 16 |
+
- Achieves 2-bit quantization with only 3-8% quality loss
|
| 17 |
+
|
| 18 |
+
### β‘ Hardware Optimization
|
| 19 |
+
- Auto-detects CPU, memory type (DDR4/DDR5), and GPU capabilities
|
| 20 |
+
- Optimizes thread counts and processing parameters
|
| 21 |
+
- GPU acceleration for 5-10x speedup on imatrix generation
|
| 22 |
+
- CUDA 12.4+ support with dynamic GPU layer offloading
|
| 23 |
+
|
| 24 |
+
### π§ Intelligent Memory Management
|
| 25 |
+
- Reserves system RAM to keep Windows responsive during conversion
|
| 26 |
+
- Monitors memory pressure and auto-pauses when needed
|
| 27 |
+
- Configurable retry logic for transient resource constraints
|
| 28 |
+
|
| 29 |
+
### π¦ Complete Workflow Support
|
| 30 |
+
- Scans directories for valid source models
|
| 31 |
+
- Selects optimal source format (BF16 > F16 > F32)
|
| 32 |
+
- Handles sharded models while preserving structure
|
| 33 |
+
- Batch processing for multiple models
|
| 34 |
+
- Desktop GUI for interactive use
|
| 35 |
+
|
| 36 |
+
## Quantization Pipeline
|
| 37 |
+
|
| 38 |
+
```
|
| 39 |
+
Source Model (BF16/F16)
|
| 40 |
+
β
|
| 41 |
+
Calibration Data Analysis
|
| 42 |
+
β
|
| 43 |
+
Importance Matrix Generation
|
| 44 |
+
β
|
| 45 |
+
Smart Bit Allocation
|
| 46 |
+
β
|
| 47 |
+
IQ2_M Quantization
|
| 48 |
+
β
|
| 49 |
+
Quality Verification
|
| 50 |
+
β
|
| 51 |
+
Production-Ready Model (16x smaller)
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
## Supported Formats
|
| 55 |
+
|
| 56 |
+
### Importance-Aware (IMatrix Required)
|
| 57 |
+
| Format | Bits/Weight | Best For |
|
| 58 |
+
|--------|-------------|----------|
|
| 59 |
+
| IQ1_M | 1.0 | Ultra-compact mobile/edge |
|
| 60 |
+
| IQ2_XXS | 2.0 | Maximum compression |
|
| 61 |
+
| IQ2_XS | 2.0 | Balanced compression |
|
| 62 |
+
| **IQ2_M** | **2.0** | **Best quality 2-bit** β |
|
| 63 |
+
| IQ2_S | 2.0 | Higher quality, slower |
|
| 64 |
+
| IQ3_M | 3.0 | Near-Q4 quality |
|
| 65 |
+
| IQ4_XS | 4.0 | Importance-aware 4-bit |
|
| 66 |
+
|
| 67 |
+
### Standard K-Quant Formats
|
| 68 |
+
Q2_K, Q3_K variants, Q4 variants, Q5 variants, Q6_K, Q8_0
|
| 69 |
+
|
| 70 |
+
### Ternary Formats
|
| 71 |
+
TQ2_0, TQ1_0 β experimental 3-value quantization
|
| 72 |
+
|
| 73 |
+
## Why IQ2_M?
|
| 74 |
+
|
| 75 |
+
IQ2_M represents the sweet spot for extreme quantization:
|
| 76 |
+
|
| 77 |
+
- **16x smaller** than FP32 models
|
| 78 |
+
- **2-3x faster** inference
|
| 79 |
+
- **VRAM usage** reduced to ~1/16th
|
| 80 |
+
- **Quality** approaches Q4_K with proper imatrix
|
| 81 |
+
- **Compatible** with llama.cpp inference stack
|
| 82 |
+
|
| 83 |
+
## Use Cases
|
| 84 |
+
|
| 85 |
+
- π€ **Edge AI** β Run large models on limited hardware
|
| 86 |
+
- π **Browser-Based Inference** β Smaller models for WebGPU/WebGL
|
| 87 |
+
- π± **Mobile Deployment** β Fit large models on phones/tablets
|
| 88 |
+
- π **High-Throughput APIs** β Serve more requests with less VRAM
|
| 89 |
+
- πΎ **Archive Storage** β Preserve models at minimal storage cost
|
| 90 |
+
|
| 91 |
+
## Technical Philosophy
|
| 92 |
+
|
| 93 |
+
The Geeked Out Quantizer focuses on:
|
| 94 |
+
|
| 95 |
+
1. **Quality Preservation** β Never sacrifice more quality than necessary
|
| 96 |
+
2. **Automation** β Minimize manual tuning through intelligent defaults
|
| 97 |
+
3. **Hardware Awareness** β Adapt to the system's capabilities
|
| 98 |
+
4. **Production Ready** β Robust error handling and retry logic
|
| 99 |
+
5. **Calibration Quality** β Emphasize representative data selection
|
| 100 |
+
|
| 101 |
+
## Model Curation
|
| 102 |
+
|
| 103 |
+
Not all models are equal candidates. The quantizer evaluates:
|
| 104 |
+
- Source format quality (BF16 preferred)
|
| 105 |
+
- Model architecture compatibility
|
| 106 |
+
- Existing quantization state
|
| 107 |
+
- Expected use case alignment
|
| 108 |
+
|
| 109 |
+
## Calibration Best Practices
|
| 110 |
+
|
| 111 |
+
The quality of your quantized model depends heavily on calibration data:
|
| 112 |
+
|
| 113 |
+
β
**DO:**
|
| 114 |
+
- Use domain-relevant text (code for code models, medical for medical models)
|
| 115 |
+
- Include diverse topics and writing styles
|
| 116 |
+
- Provide 100-500 chunks of typical document length
|
| 117 |
+
- Ensure natural token distribution
|
| 118 |
+
|
| 119 |
+
β **DON'T:**
|
| 120 |
+
- Use repetitive or overly simple text
|
| 121 |
+
- Include corrupted or random data
|
| 122 |
+
- Rely on single-domain text for general-purpose models
|
| 123 |
+
|
| 124 |
+
## Collaboration & Research
|
| 125 |
+
|
| 126 |
+
The Geeked Out Quantizer methodology is available for:
|
| 127 |
+
- Research collaborations on quantization techniques
|
| 128 |
+
- Edge deployment optimization projects
|
| 129 |
+
- Custom calibration strategies for specialized domains
|
| 130 |
+
- Hardware-specific optimization studies
|
| 131 |
+
|
| 132 |
+
## Community
|
| 133 |
+
|
| 134 |
+
All models in this Hugging Face profile are quantized using this toolchain. Each model card includes:
|
| 135 |
+
- Quantization specifications
|
| 136 |
+
- Calibration methodology
|
| 137 |
+
- Quality metrics
|
| 138 |
+
- Use case recommendations
|
| 139 |
+
|
| 140 |
+
## Future Directions
|
| 141 |
+
|
| 142 |
+
- Expanded format support (new GGML quantization types)
|
| 143 |
+
- Domain-specific calibration datasets
|
| 144 |
+
- Hardware-specific optimization profiles
|
| 145 |
+
- Batch processing automation
|
| 146 |
+
|
| 147 |
+
---
|
| 148 |
+
|
| 149 |
+
*The Geeked Out Quantizer: Making extreme compression intelligent.*
|
| 150 |
+
|
| 151 |
+
For questions about quantization methodology, collaboration opportunities, or technical discussions, please open an issue or discussion on any model in this profile.
|
QUANTIZATION_NOTES.md
ADDED
|
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Quantization Notes
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
|
| 5 |
+
This model was quantized using **The Geeked Out Quantizer**, a specialized Windows-native quantization environment designed for extreme compression with quality preservation.
|
| 6 |
+
|
| 7 |
+
## Quantization Specifications
|
| 8 |
+
|
| 9 |
+
| Parameter | Details |
|
| 10 |
+
|-----------|---------|
|
| 11 |
+
| **Source Format** | BF16 (bfloat16) or F16 (float16) |
|
| 12 |
+
| **Target Format** | IQ2_M (2.0 bits per weight) |
|
| 13 |
+
| **Compression Ratio** | 16x smaller than FP32 baseline |
|
| 14 |
+
| **Quantization Method** | Importance-aware quantization with IMatrix |
|
| 15 |
+
| **Quality Metric** | ~3-8% perplexity increase vs. baseline |
|
| 16 |
+
|
| 17 |
+
## The Importance Matrix (IMatrix) Method
|
| 18 |
+
|
| 19 |
+
### What is an Importance Matrix?
|
| 20 |
+
|
| 21 |
+
An importance matrix is a statistical analysis of a neural network that identifies which weights contribute most significantly to model output quality. Rather than applying uniform quantization across all tensors, this method:
|
| 22 |
+
|
| 23 |
+
- **Preserves precision** on high-impact weights
|
| 24 |
+
- **Aggressively compresses** low-impact weights
|
| 25 |
+
- **Maintains information flow** through the network architecture
|
| 26 |
+
|
| 27 |
+
### Why It Matters
|
| 28 |
+
|
| 29 |
+
Traditional uniform quantization to 2-bit precision typically causes 10-20% quality degradation. The importance matrix approach reduces this to 3-8%, making 2-bit models viable for production use.
|
| 30 |
+
|
| 31 |
+
## Calibration Process
|
| 32 |
+
|
| 33 |
+
### Data Selection
|
| 34 |
+
|
| 35 |
+
The importance matrix is generated using carefully selected calibration data that:
|
| 36 |
+
- Represents the model's intended use domain
|
| 37 |
+
- Contains diverse vocabulary and sentence structures
|
| 38 |
+
- Includes 100-500 text chunks of typical prompt length
|
| 39 |
+
- Matches the distribution of expected inference inputs
|
| 40 |
+
|
| 41 |
+
### Generation Parameters
|
| 42 |
+
|
| 43 |
+
| Setting | Typical Value | Purpose |
|
| 44 |
+
|---------|---------------|---------|
|
| 45 |
+
| Chunks | 200-500 | Balance quality vs. generation time |
|
| 46 |
+
| GPU Layers | 99 (max) | Accelerate processing via CUDA |
|
| 47 |
+
| Thread Count | Auto-detected | Optimize for hardware configuration |
|
| 48 |
+
|
| 49 |
+
## Memory & Hardware Optimization
|
| 50 |
+
|
| 51 |
+
The quantization process includes:
|
| 52 |
+
- **Dynamic memory management** β Reserves system RAM to maintain Windows responsiveness
|
| 53 |
+
- **Hardware detection** β Automatically detects CPU cores, memory type (DDR4/DDR5), and GPU capabilities
|
| 54 |
+
- **Thread optimization** β Adjusts parallelism based on available resources
|
| 55 |
+
- **Retry logic** β Handles transient memory pressure gracefully
|
| 56 |
+
|
| 57 |
+
## Model Selection Criteria
|
| 58 |
+
|
| 59 |
+
Source models are selected based on quality hierarchy:
|
| 60 |
+
1. **BF16** (preferred) β Best precision for quantization
|
| 61 |
+
2. **F16** β Good precision, widely available
|
| 62 |
+
3. **F32** β Acceptable but creates larger intermediate files
|
| 63 |
+
|
| 64 |
+
Models already in quantized formats are skipped unless explicitly re-quantizing.
|
| 65 |
+
|
| 66 |
+
## Output Format Details
|
| 67 |
+
|
| 68 |
+
### IQ2_M Characteristics
|
| 69 |
+
|
| 70 |
+
- **Bit depth:** 2.0 bits per weight
|
| 71 |
+
- **Speed:** 2-3x faster inference than F32
|
| 72 |
+
- **VRAM usage:** ~1/16th of FP32
|
| 73 |
+
- **Imatrix required:** Yes
|
| 74 |
+
- **Quality tier:** Best-in-class for 2-bit quantization
|
| 75 |
+
|
| 76 |
+
### Naming Convention
|
| 77 |
+
|
| 78 |
+
Quantized models follow this pattern:
|
| 79 |
+
```
|
| 80 |
+
OriginalModel-BF16.gguf β OriginalModel-IQ2_M.gguf
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
Sharded models preserve shard numbering:
|
| 84 |
+
```
|
| 85 |
+
Model-00001-of-00004.gguf β Model-IQ2_M-00001-of-00004.gguf
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
## Quality Verification
|
| 89 |
+
|
| 90 |
+
Models are validated through:
|
| 91 |
+
- Perplexity measurement against baseline
|
| 92 |
+
- Sample inference testing
|
| 93 |
+
- File integrity verification
|
| 94 |
+
|
| 95 |
+
## Use Cases
|
| 96 |
+
|
| 97 |
+
IQ2_M quantized models are ideal for:
|
| 98 |
+
- **Edge deployment** β Minimal storage footprint
|
| 99 |
+
- **Consumer hardware** β Reduced VRAM requirements
|
| 100 |
+
- **High-throughput inference** β Faster token generation
|
| 101 |
+
- **Bandwidth-constrained environments** β Efficient distribution
|
| 102 |
+
|
| 103 |
+
## Technical Notes
|
| 104 |
+
|
| 105 |
+
- Quantization performed on Windows with CUDA 12.4+ support
|
| 106 |
+
- GPU acceleration utilized for imatrix generation
|
| 107 |
+
- Multi-threaded processing with memory safety guards
|
| 108 |
+
- Compatible with llama.cpp inference engines
|
| 109 |
+
|
| 110 |
+
## Citation
|
| 111 |
+
|
| 112 |
+
If you use this quantized model in research or applications, please acknowledge:
|
| 113 |
+
|
| 114 |
+
> Quantized using The Geeked Out Quantizer with importance-aware IQ2_M optimization.
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
*For questions about the quantization method or collaboration inquiries, please open a discussion on this model's page.*
|