Instructions to use EthanGao123/CellHermes-CoT-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use EthanGao123/CellHermes-CoT-RL with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("EthanGao123/CellHermes-CoT-SFT") model = PeftModel.from_pretrained(base_model, "EthanGao123/CellHermes-CoT-RL") - Notebooks
- Google Colab
- Kaggle
CellHermes-CoT-RL
This repository contains the reinforcement-learning LoRA adapter used for the CellHermes-CoT-RL row in the TCR reactivity benchmark.
Important: Adapter-Only Repository
This repository is not a standalone merged model checkpoint. It contains only a PEFT LoRA adapter:
adapter_config.jsonadapter_model.safetensors
These two files are sufficient for distributing and loading the RL LoRA adapter with PEFT-compatible tooling, but they are not sufficient to run inference by themselves.
To reproduce the benchmark setup, load this adapter on top of the merged CellHermes-CoT-SFT checkpoint. The merged SFT checkpoint provides the base model weights, tokenizer, chat template, and SFT reasoning format; this repository provides only the post-SFT RL policy update.
Base Model for This Adapter
This adapter must be loaded on the merged SFT checkpoint generated from EthanGao123/CellHermes-CoT-SFT and the same CellHermes-v1.0 base model.
It should not be loaded directly on the original CellHermes-v1.0 base model.
Adapter
- Adapter files in this repository:
adapter_config.json,adapter_model.safetensors - Adapter type: LoRA
- LoRA rank: 32
- LoRA alpha: 64
- LoRA dropout: 0.0
- LoRA target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Source-data variant: TCR reactivity benchmark, clone split with 25% held-out clone groups, seed 2025
- RL method: GSPO-style policy optimization with structured TCR-reactivity rewards
- Selected RL checkpoint: post-SFT RL checkpoint used for the benchmark
- Reward function family: structured TCR reactivity reward
- Benchmark row:
CellHermes-CoT-RL
Inference Alignment
For the plotted benchmark, vLLM inference used:
model: the mergedCellHermes-CoT-SFTcheckpointlora: this repository
The SFT merged base provides the tokenizer, chat template, and SFT reasoning format. This RL LoRA adapter provides the post-SFT policy update.
Example PEFT-style loading pattern:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
sft_merged_base = "path_or_repo_to_merged_CellHermes-CoT-SFT"
rl_adapter = "EthanGao123/CellHermes-CoT-RL"
tokenizer = AutoTokenizer.from_pretrained(sft_merged_base)
base_model = AutoModelForCausalLM.from_pretrained(sft_merged_base)
model = PeftModel.from_pretrained(base_model, rl_adapter)
Alignment Notes
- Load this adapter only with the SFT merged base checkpoint listed above.
- Keep tokenizer files and
chat_template.jinjafrom the SFT merged base checkpoint. - Do not merge or serve this adapter against raw
CellHermes-v1.0, raw Meta-Llama, or another SFT checkpoint unless intentionally rerunning a different experiment. - The benchmark predictions for
CellHermes-CoT-RLwere generated from the SFT merged base plus this RL LoRA adapter. - Optimizer states, scheduler states, RNG states, and trainer logs are intentionally not included because they are not required for adapter inference.
- Downloads last month
- 9
Model tree for EthanGao123/CellHermes-CoT-RL
Base model
meta-llama/Llama-3.1-8B