Llama-3.1-8B-Viveka

LoRA adapter trained on the Viveka OpenEnv with TRL GRPO + Unsloth 4-bit QLoRA. Six-component deterministic reward over mocked Indian DPI services (UPI, DigiLocker, IRCTC, Banking, Telecom). 200 episodes, tier mix 1:0.4 / 2:0.4 / 4:0.2.

Base model: meta-llama/Llama-3.1-8B-Instruct

Notes: Cross-family scale test. Eval uses Unsloth's open 4-bit mirror because meta-llama is gated; the LoRA was trained against the canonical weights (via Unsloth's auto-redirect).

See github.com/DevMhrn/viveka-env for the env, reward design, and eval harness.

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ddevMhrn/Llama-3.1-8B-Viveka

Adapter
(2452)
this model