camilablank commited on
Commit
5750ae4
·
verified ·
1 Parent(s): 0352452

Add README for logprob cache usage

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reference logprobs cache (DPO)
2
+
3
+ This repo stores precomputed **reference-model** log-probability scalars for DPO training with [open-instruct](https://github.com/allenai/open-instruct) `dpo_tune_cache.py` / `dpo_utils.build_reference_logprobs_cache`.
4
+
5
+ ## Files
6
+
7
+ - `62b8d956d9260cf9.pt` — `TensorCache` on disk: dict-like payload with `chosen_logps` and `rejected_logps`, each `float32` tensor of shape `(N,)` for `N = 259922` examples.
8
+
9
+ The stem (`62b8d956d9260cf9`) is the first 16 hex chars of `SHA256(config_json)` where `config_json` is built in `dpo_utils.compute_reference_cache_hash` from dataset hash, base model id, `max_seq_length` / transforms (via `dataset_config_hash`), `loss_type`, `concatenated_forward`, `use_lora`, etc.
10
+
11
+ ## Usage
12
+
13
+ 1. Download the `.pt` file into your reference cache directory (same basename).
14
+ 2. Point the trainer at that directory:
15
+
16
+ ```bash
17
+ export REFERENCE_LOGPROBS_CACHE_PATH=/path/to/dir/containing/cache
18
+ # Ensure /path/to/dir/containing/cache/62b8d956d9260cf9.pt exists
19
+ ```
20
+
21
+ The training run must use the **same** tokenizer, dataset, `max_seq_length`, model name/revision, loss type, and LoRA flags as when the cache was built, or the hash will not match and the cache will be ignored.
22
+
23
+ ## Source
24
+
25
+ Built for **allenai/Olmo-3-7B-Instruct-SFT** reference model, Dolci pretraining-continuation DPO JSONL (`259922` examples), `dpo_norm` loss, `max_seq_length=16384`, `concatenated_forward=false`, with LoRA enabled in the training config (reference cache hash includes `use_lora`).