--- library_name: transformers tags: [] --- # Model Card for Model ID - Summary Length PPO experiment #5 - No KL divergence in loss ## Model Details - Dataset size: 1024 - Epochs: 1 - Batch Size: 4 * 4 (w/ 4 GPUs) * 8 (w/ Gradient Accumulation) Optimizer args: Torch AdamW default, except - LR = 0.00001