Llama 3 8B β€” RL-MPQ Quantized (Thesis Release)

Quantized variant of meta-llama/Meta-Llama-3-8B using RL-MPQ (Reinforcement Learning Mixed-Precision Quantization): per-layer bit-width policies trained with PPO, validated on WikiText-2 perplexity.

This repo ships five compression scenarios as subfolders β€” from near-FP16 fidelity to aggressive survival mode β€” so you can pick the bits-vs-quality trade-off for your thesis experiments.

Base model meta-llama/Meta-Llama-3-8B
Method RL-MPQ (PPO per-layer bit policy)
Format Fake-quant FP16 weights + rlmpq_policy.json
Recommended start subfolder="Balanced"

Scenarios

Results (WikiText-2)

Scenario Avg bits Compression vs FP16 Perplexity
High_Fidelity 6.875 2.3273x 5.8133
Conservative 5.25 3.0476x 5.9652
Balanced 4.5 3.5556x 6.0199
Aggressive 3.7188 4.3025x 6.6793
Extreme_Survival 2.875 5.5652x 32.393

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "AvoCahDoe/llama-3-8b-rlmpq"
scenario = "Balanced"  # High_Fidelity | Conservative | Aggressive | Extreme_Survival

model = AutoModelForCausalLM.from_pretrained(repo, subfolder=scenario, torch_dtype="float16")
tokenizer = AutoTokenizer.from_pretrained(repo, subfolder=scenario)

Important: Always pass subfolder=<scenario>. Root config.json describes the collection; weights and tokenizer live inside each scenario folder.

Method (thesis summary)

  1. Phase 3 β€” PPO agent selects per-layer bit widths under scenario-specific reward targets.
  2. Phase 4 β€” Policies replayed on real weights; WikiText-2 PPL measures quality retention.
  3. Export β€” Fake-quantized FP16 checkpoints (compatible with Hugging Face Transformers).

Citation

@misc{rlmpq2026,
  title  = {RL-MPQ: Reinforcement Learning Mixed-Precision Quantization},
  author = {AvoCahDoe},
  year   = {2026},
  url    = {https://huggingface.co/AvoCahDoe/llama-3-8b-rlmpq}
}

Part of the RL-NMP-Model-Quantasation thesis framework.

Downloads last month
50
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AvoCahDoe/llama-3-8b-rlmpq

Finetuned
(599)
this model

Dataset used to train AvoCahDoe/llama-3-8b-rlmpq