-
-
-
-
-
-
Active filters:
trl
intone/unaligned-llama3-8b-v0.1-16bit
Text Generation
•
Updated
•
18
•
1
lakshyasoni/phi3_telecom
Text Generation
•
Updated
•
1
frncscp/HAL-9000
Text Generation
•
Updated
•
2
•
1
SavantofIllusions/mad_sci_mistral_instruct
dad1909/CyberSentinel
Text Generation
•
Updated
•
140
•
1
Holarissun/REPROD_dpo_helpfulhelpful_gpt3_subset-1_modelgemma2b_maxsteps10000_bz8_lr1e-05
Updated
•
1
•
1
HaitameLaf/Phi3-Game16bit
Text Generation
•
Updated
•
86
•
1
wop/kosmox
Text Generation
•
Updated
•
74
•
1
anwesh/llama-3-8b-Instruct-bnb-4bit-yahma-alpaca-cleaned-4bit
Text Generation
•
Updated
•
54
•
1
hooking-dev/Jennifer-v1.0
Text Generation
•
Updated
•
219
•
1
wop/kosmox-small
Text Generation
•
Updated
•
13
•
1
Klevin/DECYPHERS-2b-v2
Text Generation
•
Updated
•
66
•
1
antony-pk/Phi-3-mini-4k-instruct-erpnext
Text Generation
•
Updated
•
18
•
1
tqiqbal/lora_model
AnishJoshi/codellama2-finetuned-nl2bash
Updated
•
21
•
1
wop/kosmox-tiny
Text Generation
•
Updated
•
6
•
1
lewtun/dummy-trl-model
Reinforcement Learning
•
Updated
•
2
•
1
ybelkada/gpt-neo-125m-detox
Reinforcement Learning
•
Updated
•
36
ybelkada/gpt-neo-125m-detoxified-long-context
Reinforcement Learning
•
Updated
•
1
dshin/flan-t5-ppo
Reinforcement Learning
•
Updated
•
1
SummerSigh/T5-Base-Rule-Of-Thumb-RM
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-testing
Reinforcement Learning
•
Updated
•
1
SummerSigh/T5-Base-EvilPrompterRM
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-testing-violation
Reinforcement Learning
•
Updated
dshin/flan-t5-ppo-user-b
Reinforcement Learning
•
Updated
dshin/flan-t5-ppo-user-h-use-violation
Reinforcement Learning
•
Updated
dshin/flan-t5-ppo-user-f-use-violation
Reinforcement Learning
•
Updated
dshin/flan-t5-ppo-user-e-use-violation
Reinforcement Learning
•
Updated
dshin/flan-t5-ppo-user-a-use-violation
Reinforcement Learning
•
Updated
dshin/flan-t5-ppo-user-h-batch-size-8-epoch-0
Reinforcement Learning
•
Updated
•
1