electricsheepasia/asia-owid-aquaculture-farmed-fish-production Viewer • Updated about 8 hours ago • 3.07k • 1
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases Paper • 2605.27355 • Published 8 days ago • 5
"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration Paper • 2605.21363 • Published 14 days ago • 6
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 14 days ago • 204
stefanocarrera/autophagycode_D_he_train-mercury_Qwen3-4B_strategy_trust_t0.75_g1_run2 Viewer • Updated 15 days ago • 164 • 73 • 1
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 27 days ago • 232
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published Mar 23 • 136
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published Mar 17 • 311
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published Mar 16 • 153
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published Mar 4 • 211