Instructions to use ddevMhrn/Llama-3.1-8B-Viveka with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ddevMhrn/Llama-3.1-8B-Viveka with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3.1-8b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "ddevMhrn/Llama-3.1-8B-Viveka") - Notebooks
- Google Colab
- Kaggle
Llama-3.1-8B-Viveka
LoRA adapter trained on the Viveka OpenEnv with TRL GRPO + Unsloth 4-bit QLoRA. Six-component deterministic reward over mocked Indian DPI services (UPI, DigiLocker, IRCTC, Banking, Telecom). 200 episodes, tier mix 1:0.4 / 2:0.4 / 4:0.2.
Base model: meta-llama/Llama-3.1-8B-Instruct
Notes: Cross-family scale test. Eval uses Unsloth's open 4-bit mirror because meta-llama is gated; the LoRA was trained against the canonical weights (via Unsloth's auto-redirect).
See github.com/DevMhrn/viveka-env for the env, reward design, and eval harness.
- Downloads last month
- 25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for ddevMhrn/Llama-3.1-8B-Viveka
Base model
meta-llama/Llama-3.1-8B Finetuned
meta-llama/Llama-3.1-8B-Instruct