Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
mnoukhov
/
pythia410m-dpo-tldr
like
0
PEFT
TensorBoard
Safetensors
Generated from Trainer
License:
apache-2.0
Model card
Files
Files and versions
Metrics
Training metrics
Community
Use this model
01300dd
pythia410m-dpo-tldr
/
code
Ctrl+K
Ctrl+K
1 contributor
History:
1 commit
This model has 1 file scanned as unsafe.
Show
files
mnoukhov
mnoukhov/pythia410m-dpo-tldr
01300dd
verified
12 months ago
__pycache__
mnoukhov/pythia410m-dpo-tldr
12 months ago
configs
mnoukhov/pythia410m-dpo-tldr
12 months ago
results
mnoukhov/pythia410m-dpo-tldr
12 months ago
scripts
mnoukhov/pythia410m-dpo-tldr
12 months ago
wandb
mnoukhov/pythia410m-dpo-tldr
12 months ago
README.md
Safe
171 Bytes
mnoukhov/pythia410m-dpo-tldr
12 months ago
Untitled.ipynb
Safe
46.2 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
callbacks.py
Safe
17.8 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
create_rlhf_dataset.py
6.29 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
dpo.py
5.77 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
dpo_costa.py
24.1 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
dpo_training.py
24.4 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
evaluate_rouge.py
7.73 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
generate_and_eval.py
12.6 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
generate_and_llm_judge.py
13.3 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
generate_vllm.py
10.2 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
gpt_reward_modeling.py
21.5 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
inference_pseudolabel.py
6.84 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
merge_peft_adapter.py
2.16 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
preproc_hh_rlhf.py
2.46 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
push_to_hub.py
457 Bytes
mnoukhov/pythia410m-dpo-tldr
12 months ago
reward_modeling.py
8.52 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
rl_training.py
26.6 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
rl_training_value_model.py
13.1 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
run_on_mila.py
7.3 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
run_on_snow.py
10.4 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
scalar_rm_model.py
10.3 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
sft.py
4.43 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
supervised_finetuning.py
12 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago
tldr_dataset.py
22.3 kB
mnoukhov/pythia410m-dpo-tldr
12 months ago