YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
payelb/HHRLHF_Llama-3.2-1B_aligned_with_semantic_MARS_deberta_RM
Base model: meta-llama/Llama-3.2-1B-Instruct
Alignment dataset: Anthropic/hh-rlhf
Reward model: payelb/HHRLHF_reward-model-deberta-v3-base_1k_fixed_MARS_semantic_distance_synth
Method: PPO alignment with LoRA adapters.
Reward model type: semantic-MARS DeBERTa-v3-base reward model.
Matched PPO setup:
- TOTAL_PPO_STEPS: 250
- PPO_EPOCHS: 2
- LR: 1e-05
- Batch size: 16
- Mini-batch size: 4
- Gradient accumulation: 4
- MIN_NEW_TOKENS: 32
- MAX_NEW_TOKENS: 96
- USE_REWARD_NORMALIZATION: False
- USE_EXPLICIT_KL_CONFIG: False
- Generation: do_sample=True, top_p=0.9, temperature=0.8, eos_token_id=eos_token
- LoRA enabled
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support