docs(readme): replace em dashes with cleaner punctuation
Browse files
README.md
CHANGED
|
@@ -7,7 +7,7 @@
|
|
| 7 |
[](https://github.com/astral-sh/ruff)
|
| 8 |
[](CONTRIBUTING.md)
|
| 9 |
|
| 10 |
-
**Biological Reinforcement Learning from Human Feedback**
|
| 11 |
|
| 12 |
## Highlights
|
| 13 |
|
|
@@ -16,7 +16,7 @@
|
|
| 16 |
- **+19% reward improvement** over SFT baseline using GRPO (0.650 vs 0.547)
|
| 17 |
- **-70% calibration error**: ECE reduced from 0.258 to 0.078 after GRPO
|
| 18 |
- **90% accuracy** on domain-specific biological reasoning tasks (SFT stage)
|
| 19 |
-
- **Learns from 363 examples**
|
| 20 |
|
| 21 |
## Key Results
|
| 22 |
|
|
@@ -197,8 +197,8 @@ reward = composer.score(question, response, ground_truth)
|
|
| 197 |
Training data is derived from a 2x2x2 factorial transcriptomic study:
|
| 198 |
|
| 199 |
- **Drug**: Kaempferol (KMP) vs Control
|
| 200 |
-
- **Stressor 1**: Hindlimb Unloading (HU)
|
| 201 |
-
- **Stressor 2**: Ionizing Radiation (IR)
|
| 202 |
- **Tissues**: Heart, Hippocampus, Liver, Soleus (+ Eye, Thymus for GRPO hold-out)
|
| 203 |
|
| 204 |
### Training Example Types
|
|
@@ -289,15 +289,15 @@ BioRLHF/
|
|
| 289 |
|
| 290 |
## Key Learnings for AI Safety
|
| 291 |
|
| 292 |
-
1. **Honesty is trainable**
|
| 293 |
-
2. **Domain grounding matters**
|
| 294 |
-
3. **Multi-reward > single reward**
|
| 295 |
-
4. **Preference learning is fragile**
|
| 296 |
-
5. **Evaluation drives improvement**
|
| 297 |
|
| 298 |
## Related Projects
|
| 299 |
|
| 300 |
-
- **[SpaceOmicsBench](https://github.com/jang1563/SpaceOmicsBench)**
|
| 301 |
|
| 302 |
## Citation
|
| 303 |
|
|
@@ -319,7 +319,7 @@ Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for gui
|
|
| 319 |
|
| 320 |
## License
|
| 321 |
|
| 322 |
-
This project is licensed under the MIT License
|
| 323 |
|
| 324 |
---
|
| 325 |
|
|
|
|
| 7 |
[](https://github.com/astral-sh/ruff)
|
| 8 |
[](CONTRIBUTING.md)
|
| 9 |
|
| 10 |
+
**Biological Reinforcement Learning from Human Feedback**: A framework for fine-tuning LLMs on biological reasoning tasks using SFT, DPO, and GRPO with verifier-based reward models for factual accuracy, calibrated uncertainty, and chain-of-thought reasoning.
|
| 11 |
|
| 12 |
## Highlights
|
| 13 |
|
|
|
|
| 16 |
- **+19% reward improvement** over SFT baseline using GRPO (0.650 vs 0.547)
|
| 17 |
- **-70% calibration error**: ECE reduced from 0.258 to 0.078 after GRPO
|
| 18 |
- **90% accuracy** on domain-specific biological reasoning tasks (SFT stage)
|
| 19 |
+
- **Learns from 363 examples**: efficient domain adaptation from spaceflight transcriptomics data
|
| 20 |
|
| 21 |
## Key Results
|
| 22 |
|
|
|
|
| 197 |
Training data is derived from a 2x2x2 factorial transcriptomic study:
|
| 198 |
|
| 199 |
- **Drug**: Kaempferol (KMP) vs Control
|
| 200 |
+
- **Stressor 1**: Hindlimb Unloading (HU): simulates microgravity
|
| 201 |
+
- **Stressor 2**: Ionizing Radiation (IR): simulates space radiation
|
| 202 |
- **Tissues**: Heart, Hippocampus, Liver, Soleus (+ Eye, Thymus for GRPO hold-out)
|
| 203 |
|
| 204 |
### Training Example Types
|
|
|
|
| 289 |
|
| 290 |
## Key Learnings for AI Safety
|
| 291 |
|
| 292 |
+
1. **Honesty is trainable**: Models can learn appropriate epistemic humility
|
| 293 |
+
2. **Domain grounding matters**: Anchoring to experimental truth prevents hallucination
|
| 294 |
+
3. **Multi-reward > single reward**: Decomposing correctness into verifiable dimensions improves learning signal
|
| 295 |
+
4. **Preference learning is fragile**: DPO can catastrophically forget domain knowledge
|
| 296 |
+
5. **Evaluation drives improvement**: Systematic testing reveals specific failure modes
|
| 297 |
|
| 298 |
## Related Projects
|
| 299 |
|
| 300 |
+
- **[SpaceOmicsBench](https://github.com/jang1563/SpaceOmicsBench)**: 115-question benchmark for LLMs on spaceflight biomedical data
|
| 301 |
|
| 302 |
## Citation
|
| 303 |
|
|
|
|
| 319 |
|
| 320 |
## License
|
| 321 |
|
| 322 |
+
This project is licensed under the MIT License: see the [LICENSE](LICENSE) file for details.
|
| 323 |
|
| 324 |
---
|
| 325 |
|