Llama 2 7B β€” RL-MPQ Quantized (Thesis Release)

Quantized variant of meta-llama/Llama-2-7b-hf using RL-MPQ (Reinforcement Learning Mixed-Precision Quantization): per-layer bit-width policies trained with PPO, validated on WikiText-2 perplexity.

This repo ships five compression scenarios as subfolders β€” from near-FP16 fidelity to aggressive survival mode β€” so you can pick the bits-vs-quality trade-off for your thesis experiments.

Base model meta-llama/Llama-2-7b-hf
Method RL-MPQ (PPO per-layer bit policy)
Format Fake-quant FP16 weights + rlmpq_policy.json
Recommended start subfolder="Balanced"

Scenarios

Results (WikiText-2)

Scenario Avg bits Compression vs FP16 Perplexity
High_Fidelity 6.5 2.4615x 4.9808
Conservative 5.125 3.122x 5.0276
Balanced 4.375 3.6571x 5.0437
Aggressive 3.5938 4.4522x 5.2614
Extreme_Survival 2.9688 5.3895x 10.9577

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "AvoCahDoe/llama-2-7b-rlmpq"
scenario = "Balanced"  # High_Fidelity | Conservative | Aggressive | Extreme_Survival

model = AutoModelForCausalLM.from_pretrained(repo, subfolder=scenario, torch_dtype="float16")
tokenizer = AutoTokenizer.from_pretrained(repo, subfolder=scenario)

Important: Always pass subfolder=<scenario>. Root config.json describes the collection; weights and tokenizer live inside each scenario folder.

Method (thesis summary)

  1. Phase 3 β€” PPO agent selects per-layer bit widths under scenario-specific reward targets.
  2. Phase 4 β€” Policies replayed on real weights; WikiText-2 PPL measures quality retention.
  3. Export β€” Fake-quantized FP16 checkpoints (compatible with Hugging Face Transformers).

Citation

@misc{rlmpq2026,
  title  = {RL-MPQ: Reinforcement Learning Mixed-Precision Quantization},
  author = {AvoCahDoe},
  year   = {2026},
  url    = {https://huggingface.co/AvoCahDoe/llama-2-7b-rlmpq}
}

Part of the RL-NMP-Model-Quantasation thesis framework.

Downloads last month
44
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AvoCahDoe/llama-2-7b-rlmpq

Finetuned
(968)
this model

Dataset used to train AvoCahDoe/llama-2-7b-rlmpq