camilablank
/

logprob_cache

Model card Files Files and versions

xet

Community

camilablank commited on 9 days ago

Commit

5750ae4

verified ·

1 Parent(s): 0352452

Add README for logprob cache usage

Browse files

Files changed (1) hide show

README.md +25 -0

README.md ADDED Viewed

	@@ -0,0 +1,25 @@

+# Reference logprobs cache (DPO)
+This repo stores precomputed **reference-model** log-probability scalars for DPO training with [open-instruct](https://github.com/allenai/open-instruct) `dpo_tune_cache.py` / `dpo_utils.build_reference_logprobs_cache`.
+## Files
+- `62b8d956d9260cf9.pt` — `TensorCache` on disk: dict-like payload with `chosen_logps` and `rejected_logps`, each `float32` tensor of shape `(N,)` for `N = 259922` examples.
+The stem (`62b8d956d9260cf9`) is the first 16 hex chars of `SHA256(config_json)` where `config_json` is built in `dpo_utils.compute_reference_cache_hash` from dataset hash, base model id, `max_seq_length` / transforms (via `dataset_config_hash`), `loss_type`, `concatenated_forward`, `use_lora`, etc.
+## Usage
+1. Download the `.pt` file into your reference cache directory (same basename).
+2. Point the trainer at that directory:
+```bash
+export REFERENCE_LOGPROBS_CACHE_PATH=/path/to/dir/containing/cache
+# Ensure /path/to/dir/containing/cache/62b8d956d9260cf9.pt exists
+```
+The training run must use the **same** tokenizer, dataset, `max_seq_length`, model name/revision, loss type, and LoRA flags as when the cache was built, or the hash will not match and the cache will be ignored.
+## Source
+Built for **allenai/Olmo-3-7B-Instruct-SFT** reference model, Dolci pretraining-continuation DPO JSONL (`259922` examples), `dpo_norm` loss, `max_seq_length=16384`, `concatenated_forward=false`, with LoRA enabled in the training config (reference cache hash includes `use_lora`).