Add README for logprob cache usage
Browse files
README.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Reference logprobs cache (DPO)
|
| 2 |
+
|
| 3 |
+
This repo stores precomputed **reference-model** log-probability scalars for DPO training with [open-instruct](https://github.com/allenai/open-instruct) `dpo_tune_cache.py` / `dpo_utils.build_reference_logprobs_cache`.
|
| 4 |
+
|
| 5 |
+
## Files
|
| 6 |
+
|
| 7 |
+
- `62b8d956d9260cf9.pt` — `TensorCache` on disk: dict-like payload with `chosen_logps` and `rejected_logps`, each `float32` tensor of shape `(N,)` for `N = 259922` examples.
|
| 8 |
+
|
| 9 |
+
The stem (`62b8d956d9260cf9`) is the first 16 hex chars of `SHA256(config_json)` where `config_json` is built in `dpo_utils.compute_reference_cache_hash` from dataset hash, base model id, `max_seq_length` / transforms (via `dataset_config_hash`), `loss_type`, `concatenated_forward`, `use_lora`, etc.
|
| 10 |
+
|
| 11 |
+
## Usage
|
| 12 |
+
|
| 13 |
+
1. Download the `.pt` file into your reference cache directory (same basename).
|
| 14 |
+
2. Point the trainer at that directory:
|
| 15 |
+
|
| 16 |
+
```bash
|
| 17 |
+
export REFERENCE_LOGPROBS_CACHE_PATH=/path/to/dir/containing/cache
|
| 18 |
+
# Ensure /path/to/dir/containing/cache/62b8d956d9260cf9.pt exists
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
The training run must use the **same** tokenizer, dataset, `max_seq_length`, model name/revision, loss type, and LoRA flags as when the cache was built, or the hash will not match and the cache will be ignored.
|
| 22 |
+
|
| 23 |
+
## Source
|
| 24 |
+
|
| 25 |
+
Built for **allenai/Olmo-3-7B-Instruct-SFT** reference model, Dolci pretraining-continuation DPO JSONL (`259922` examples), `dpo_norm` loss, `max_seq_length=16384`, `concatenated_forward=false`, with LoRA enabled in the training config (reference cache hash includes `use_lora`).
|