WARP: Weight-Space Analysis for Recovering Training Data Portfolios
Abstract
WARP is a framework that infers training data compositions from released model weights by analyzing geometric footprints in weight space through model merging and feature extraction.
Foundation models are routinely released to the public, yet the data recipes used to train them -- such as domain mixture weights that determine how different sources are sampled -- are rarely disclosed. This creates an access asymmetry: researchers study the resulting models but lack visibility into the training distribution that produces them. Prior works for inferring training data, such as membership inference, detect at the level of individual samples and thus cannot characterize the global composition of the training corpus. We introduce WARP, a framework that recovers a fine-tuned model's training mixtures directly from its released weights. WARP interpolates between the base and fine-tuned models using model merging, generating pseudo-checkpoints that approximate the missing training trajectory and expose a geometric footprint of the training data in the weight space. From these simulated footprints, WARP extracts geometric features and maps them to domain proportions using either a parameter-free softmax readout or an MLP projector trained on synthetic mixtures. In controlled experiments with BERT and GPT-2, WARP recovers domain mixtures with an average MAE as low as 0.046 and 0.104 respectively, outperforming membership inference and a variant with access to the true training trajectory.
Community
Weight-space geometry encodes traces of training data. Can we use it to reverse-engineer data recipes? Introducing WARP: a new strategy to estimate domain mixtures from model weights alone!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Revealing Training Data Exposure in Vision Language Large Models via Parameter Gradients (2026)
- Consolidating Rewarded Perturbations for LLM Post-Training (2026)
- Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting (2026)
- Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time (2026)
- ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities? (2026)
- STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations (2026)
- TuneJury: An Open Metric for Improving Music Generation Preference Alignment (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2607.01686 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper