arxiv:2310.08164
Abdullah
amirabdullah19852020
AI & ML interests
Mechanistic interpretability, high dimensional geometry, persona role playing.
Organizations
Papers
1
models
17
amirabdullah19852020/base_llama_1b_sae
Updated
amirabdullah19852020/interpreting_reward_models
Updated
amirabdullah19852020/test
Text Generation
•
Updated
•
19
amirabdullah19852020/gpt-neo-125m_hh_reward
Text Generation
•
Updated
•
30
amirabdullah19852020/gpt-neo-125m_utility_reward
Reinforcement Learning
•
Updated
•
9
amirabdullah19852020/pythia-70m_sentiment_reward
Reinforcement Learning
•
Updated
•
12
amirabdullah19852020/pythia-160m_sentiment_reward
Reinforcement Learning
•
Updated
•
13
amirabdullah19852020/gpt-neo-125m_sentiment_reward
Reinforcement Learning
•
Updated
•
14
amirabdullah19852020/pythia-160m_utility_reward
Reinforcement Learning
•
Updated
•
13
amirabdullah19852020/pythia-70m_utility_reward
Reinforcement Learning
•
Updated
•
19