Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
amang1802
/
Llama3.2-1B-summary-length-exp7
like
0
Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
Inference Endpoints
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
Model Card for Model ID
Model Details
Model Card for Model ID
Summary Length PPO experiment #7
No KL divergence in loss
Model Details
Dataset size: 16384
Epochs: 1
Batch Size: 16 * 4 (w/ 4 GPUs)
Optimizer args: Torch AdamW default, except
LR = 0.00001
Downloads last month
37
Safetensors
Model size
1.24B params
Tensor type
BF16
·
Inference Providers
NEW
Text Generation
This model is not currently available via any of the supported Inference Providers.
Collection including
amang1802/Llama3.2-1B-summary-length-exp7
PPO experiments
Collection
Using PPO with simpler reward functions
•
8 items
•
Updated
Jan 23