maia3-79m-fp16

This repository contains an fp16 reduced-precision release derived from the original UofTCSSLab/Maia3-79M checkpoint.

Base model relationship

Repository metadata is configured to establish a derived-model relationship on the Hugging Face Hub.

  • base_model: UofTCSSLab/Maia3-79M
  • base_model_relation: quantized

Although Hugging Face uses quantized as the repository relation type, this release performs precision reduction rather than integer quantization.

Quantization details

The original checkpoint weights were converted from float32 (fp32) to float16 (fp16).

Conversion:

float32 โ†’ float16

Characteristics:

  • Reduced model size (~50% relative to fp32)
  • Lower memory usage during loading and inference
  • Increased throughput on hardware with efficient half-precision support
  • Minimal expected quality degradation compared to the original checkpoint
  • Weights remain floating-point values rather than integer-quantized representations

This release does not use:

  • int8
  • int4
  • GPTQ
  • AWQ
  • GGUF quantization schemes
  • other integer weight compression methods

Repository contents

  • maia3-79m-fp16.pt โ€” converted fp16 checkpoint
  • model definition or loading utilities required for inference

Loading example

import torch

state_dict = torch.load(
    "maia3-79m-fp16.pt",
    map_location="cpu"
)

Provenance

  • Original checkpoint: UofTCSSLab/Maia3-79M
  • Converted by: bqrio
  • Conversion method: float32 โ†’ float16
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bqrio/maia3-79m-fp16

Quantized
(1)
this model