Spaces:

dar-tau
/

selfie

Runtime error

App Files Files Community

dar-tau commited on Apr 9

Commit

b29377d

•

1 Parent(s): 868605b

Update app.py

Browse files

Files changed (1) hide show

app.py +11 -9

app.py CHANGED Viewed

@@ -182,12 +182,14 @@ with gr.Blocks(theme=gr.themes.Default(), css=css) as demo:
     with gr.Row():
         with gr.Column(scale=5):
             gr.Markdown('# 😎 Self-Interpreting Models')
             gr.Markdown(
-                '**👾 This space is a simple introduction to the emerging trend of models interpreting their OWN hidden states in free form natural language!!👾**',
-                # elem_classes=['explanation_accordion']
-            )
-            gr.Markdown(
-            '''This idea was investigated in the paper **Patchscopes** ([Ghandeharioun et al., 2024](https://arxiv.org/abs/2401.06102)) and was further explored in **SelfIE** ([Chen et al., 2024](https://arxiv.org/abs/2403.10949)).
                 An honorary mention of **Speaking Probes** ([Dar, 2023](https://towardsdatascience.com/speaking-probes-self-interpreting-models-7a3dc6cb33d6) - my own work 🥳) which was less mature but had the same idea in mind.
                 We will follow the SelfIE implementation in this space for concreteness. Patchscopes are so general that they encompass many other interpretation techniques too!!!
             ''', line_breaks=True)
@@ -200,7 +202,7 @@ with gr.Blocks(theme=gr.themes.Default(), css=css) as demo:
             **👾 The idea is really simple: models are able to understand their own hidden states by nature! 👾**
             According to the residual stream view ([nostalgebraist, 2020](https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens)), internal representations from different layers are transferable between layers.
             So we can inject an representation from (roughly) any layer to any layer! If I give a model a prompt of the form ``User: [X] Assistant: Sure'll I'll repeat your message`` and replace the internal representation of ``[X]`` *during computation* with the hidden state we want to understand,
-            we expect to get back a summary of the information that exists inside the hidden state. Since the model uses a roughly common latent space, it can understand representations from different layers and different runs!! How cool is that! 😯😯😯
             ''', line_breaks=True)
         # with gr.Column(scale=1):
@@ -209,9 +211,9 @@ with gr.Blocks(theme=gr.themes.Default(), css=css) as demo:
     with gr.Group('Interpretation'):
         interpretation_prompt = gr.Text(suggested_interpretation_prompts[0], label='Interpretation Prompt')
-    gr.Markdown('''
-    Here are some examples of prompts we can analyze their internal representations:
-    ''')
     # for info in dataset_info:
     #     with gr.Tab(info['name']):

     with gr.Row():
         with gr.Column(scale=5):
             gr.Markdown('# 😎 Self-Interpreting Models')
+            # gr.Markdown(
+            #     '**👾 This space is a simple introduction to the emerging trend of models interpreting their OWN hidden states in free form natural language!!👾**',
+            #     # elem_classes=['explanation_accordion']
+            # )
             gr.Markdown(
+            '''
+                **👾 This space is a simple introduction to the emerging trend of models interpreting their OWN hidden states in free form natural language!!👾**
+                This idea was investigated in the paper **Patchscopes** ([Ghandeharioun et al., 2024](https://arxiv.org/abs/2401.06102)) and was further explored in **SelfIE** ([Chen et al., 2024](https://arxiv.org/abs/2403.10949)).
                 An honorary mention of **Speaking Probes** ([Dar, 2023](https://towardsdatascience.com/speaking-probes-self-interpreting-models-7a3dc6cb33d6) - my own work 🥳) which was less mature but had the same idea in mind.
                 We will follow the SelfIE implementation in this space for concreteness. Patchscopes are so general that they encompass many other interpretation techniques too!!!
             ''', line_breaks=True)
             **👾 The idea is really simple: models are able to understand their own hidden states by nature! 👾**
             According to the residual stream view ([nostalgebraist, 2020](https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens)), internal representations from different layers are transferable between layers.
             So we can inject an representation from (roughly) any layer to any layer! If I give a model a prompt of the form ``User: [X] Assistant: Sure'll I'll repeat your message`` and replace the internal representation of ``[X]`` *during computation* with the hidden state we want to understand,
+            we expect to get back a summary of the information that exists inside the hidden state from different layers and different runs!! How cool is that! 😯😯😯
             ''', line_breaks=True)
         # with gr.Column(scale=1):
     with gr.Group('Interpretation'):
         interpretation_prompt = gr.Text(suggested_interpretation_prompts[0], label='Interpretation Prompt')
+    # gr.Markdown('''
+    # Here are some examples of prompts we can analyze their internal representations:
+    # ''')
     # for info in dataset_info:
     #     with gr.Tab(info['name']):