Progress Report
Task: PlainMLP vs ResMLP Comparison on Distant Identity Task
- Step 1: Setup project directory - DONE
- Step 2: Implement PlainMLP architecture - DONE
- Step 3: Implement ResMLP architecture - DONE
- Step 4: Generate synthetic identity data - DONE
- Step 5: Train both models for 500 steps - DONE
- Step 6: Capture activation/gradient statistics - DONE
- Step 7: Generate all 4 plots - DONE
- Step 8: Create summary report - IN PROGRESS
Key Results
| Metric | PlainMLP | ResMLP |
|---|---|---|
| Final Loss | 0.3123 | 0.0630 |
| Improvement | - | 5.0x |
| Gradient Range | [7.6e-3, 1.0e-2] | [1.9e-3, 3.8e-3] |
| Activation Std Range | [0.36, 0.95] | [0.13, 0.18] |
Files Generated
experiment_final.py- Main experiment coderesults.json- Numerical resultsplots/training_loss.png- Training loss comparisonplots/gradient_magnitude.png- Per-layer gradient normsplots/activation_mean.png- Per-layer activation meansplots/activation_std.png- Per-layer activation stds