Spaces:
Running
Running
add new results and model description tab
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- app.py +17 -1
- info/ARMT.md +1 -0
- info/GPT.md +27 -0
- info/RMT.md +3 -0
- notebooks/add_results_copy_paste.ipynb +264 -0
- results/ARMT/qa1/1000000.csv +2 -0
- results/ARMT/qa1/10000000.csv +2 -0
- results/ARMT/qa1/128000.csv +2 -0
- results/ARMT/qa1/16000.csv +2 -0
- results/ARMT/qa1/32000.csv +2 -0
- results/ARMT/qa1/4000.csv +2 -0
- results/ARMT/qa1/500000.csv +2 -0
- results/ARMT/qa1/64000.csv +2 -0
- results/ARMT/qa1/8000.csv +2 -0
- results/ARMT/qa2/1000000.csv +2 -0
- results/ARMT/qa2/10000000.csv +2 -0
- results/ARMT/qa2/128000.csv +2 -0
- results/ARMT/qa2/16000.csv +2 -0
- results/ARMT/qa2/32000.csv +2 -0
- results/ARMT/qa2/4000.csv +2 -0
- results/ARMT/qa2/500000.csv +2 -0
- results/ARMT/qa2/64000.csv +2 -0
- results/ARMT/qa2/8000.csv +2 -0
- results/ARMT/qa3/1000000.csv +2 -0
- results/ARMT/qa3/10000000.csv +2 -0
- results/ARMT/qa3/128000.csv +2 -0
- results/ARMT/qa3/16000.csv +2 -0
- results/ARMT/qa3/32000.csv +2 -0
- results/ARMT/qa3/4000.csv +2 -0
- results/ARMT/qa3/500000.csv +2 -0
- results/ARMT/qa3/64000.csv +2 -0
- results/ARMT/qa3/8000.csv +2 -0
- results/ARMT/qa4/1000000.csv +2 -0
- results/ARMT/qa4/10000000.csv +2 -0
- results/ARMT/qa4/128000.csv +2 -0
- results/ARMT/qa4/16000.csv +2 -0
- results/ARMT/qa4/32000.csv +2 -0
- results/ARMT/qa4/4000.csv +2 -0
- results/ARMT/qa4/500000.csv +2 -0
- results/ARMT/qa4/64000.csv +2 -0
- results/ARMT/qa4/8000.csv +2 -0
- results/ARMT/qa5/1000000.csv +2 -0
- results/ARMT/qa5/10000000.csv +2 -0
- results/ARMT/qa5/128000.csv +2 -0
- results/ARMT/qa5/16000.csv +2 -0
- results/ARMT/qa5/32000.csv +2 -0
- results/ARMT/qa5/4000.csv +2 -0
- results/ARMT/qa5/500000.csv +2 -0
- results/ARMT/qa5/64000.csv +2 -0
- results/ARMT/qa5/8000.csv +2 -0
app.py
CHANGED
@@ -22,10 +22,21 @@ def make_default_md():
|
|
22 |
|
23 |
|
24 |
def make_arena_leaderboard_md(total_models):
|
25 |
-
leaderboard_md = f"""Total #models: **{total_models}**. Last updated:
|
26 |
return leaderboard_md
|
27 |
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
def model_hyperlink(model_name, link):
|
31 |
return f'<a target="_blank" href="{link}" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">{model_name}</a>'
|
@@ -110,6 +121,11 @@ def build_leaderboard_tab(folders):
|
|
110 |
column_widths=[50, 200] + [100] * len(msg_lengths),
|
111 |
wrap=True,
|
112 |
)
|
|
|
|
|
|
|
|
|
|
|
113 |
return [md_1]
|
114 |
|
115 |
block_css = """
|
|
|
22 |
|
23 |
|
24 |
def make_arena_leaderboard_md(total_models):
|
25 |
+
leaderboard_md = f"""Total #models: **{total_models}**. Last updated: Mar 29, 2024."""
|
26 |
return leaderboard_md
|
27 |
|
28 |
|
29 |
+
def make_model_desc_md(f_len):
|
30 |
+
desc_md = make_arena_leaderboard_md(f_len)
|
31 |
+
models = next(os.walk('info'))[2]
|
32 |
+
for model in models:
|
33 |
+
model_name = model.split('.md')[0]
|
34 |
+
with open(os.path.join('info', model), 'r') as f:
|
35 |
+
description = f.read()
|
36 |
+
|
37 |
+
desc_md += f"\n\n### {model_name}\n{description}"
|
38 |
+
return desc_md
|
39 |
+
|
40 |
|
41 |
def model_hyperlink(model_name, link):
|
42 |
return f'<a target="_blank" href="{link}" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">{model_name}</a>'
|
|
|
121 |
column_widths=[50, 200] + [100] * len(msg_lengths),
|
122 |
wrap=True,
|
123 |
)
|
124 |
+
|
125 |
+
with gr.Tab("Model description", id=tab_id + 1):
|
126 |
+
desc_md = make_model_desc_md(len(folders))
|
127 |
+
gr.Markdown(desc_md, elem_id="leaderboard_markdown")
|
128 |
+
|
129 |
return [md_1]
|
130 |
|
131 |
block_css = """
|
info/ARMT.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
ARMT is an associative memory version of RMT. Please refer to [ [code](https://github.com/RodkinIvan/t5-experiments/) ]
|
info/GPT.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
We use the following prompts for GPT-4-Turbo and Mistral models:
|
2 |
+
|
3 |
+
#### qa1
|
4 |
+
|
5 |
+
```
|
6 |
+
I will give you context with the facts about positions of different persons hidden
|
7 |
+
in some random text and a question. You need to answer the question based only on
|
8 |
+
the information from the facts. If a person was in different locations, use the
|
9 |
+
latest location to answer the question.
|
10 |
+
<example>
|
11 |
+
Charlie went to the hallway. Judith come back to the kitchen. Charlie travelled to
|
12 |
+
balcony. Where is Charlie?
|
13 |
+
Answer: The most recent location of Charlie is balcony.
|
14 |
+
</example>
|
15 |
+
<example>
|
16 |
+
Alan moved to the garage. Charlie went to the beach. Alan went to the shop. Rouse
|
17 |
+
travelled to balcony. Where is Alan?
|
18 |
+
Answer: The most recent location of Alan is shop.
|
19 |
+
</example>
|
20 |
+
<context>
|
21 |
+
{qa1 query with noise}
|
22 |
+
</context>
|
23 |
+
QUESTION: {qa1 question}
|
24 |
+
Always return your answer in the following format: The most recent location of
|
25 |
+
’person’ is ’location’. Do not write anything else after that.
|
26 |
+
```
|
27 |
+
For prompts for other qa tasks please refer to the [ [paper](https://arxiv.org/abs/2402.10790) ].
|
info/RMT.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
RMT is a memory-augmented segment-level recurrent Transformer. We implement our memory mechanism as a wrapper for any Hugging Face model by adding special memory tokens to the input sequence. The model is trained to control both memory operations and sequence representations processing.
|
2 |
+
|
3 |
+
See: [ [paper](https://arxiv.org/abs/2402.10790) ] and [ [code](https://github.com/booydar/recurrent-memory-transformer/tree/babilong-release) ] for **Recurrent Memory Transformer** implementation and training examples.
|
notebooks/add_results_copy_paste.ipynb
ADDED
@@ -0,0 +1,264 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "code",
|
5 |
+
"execution_count": 1,
|
6 |
+
"metadata": {},
|
7 |
+
"outputs": [],
|
8 |
+
"source": [
|
9 |
+
"import os\n",
|
10 |
+
"import pandas as pd\n",
|
11 |
+
"import re"
|
12 |
+
]
|
13 |
+
},
|
14 |
+
{
|
15 |
+
"cell_type": "code",
|
16 |
+
"execution_count": 42,
|
17 |
+
"metadata": {},
|
18 |
+
"outputs": [],
|
19 |
+
"source": [
|
20 |
+
"out_path = \"results/\"\n",
|
21 |
+
"lens = [0, 4000,8000,16000,32000,64000,128000,500000,1000000,10000000]"
|
22 |
+
]
|
23 |
+
},
|
24 |
+
{
|
25 |
+
"cell_type": "code",
|
26 |
+
"execution_count": 77,
|
27 |
+
"metadata": {},
|
28 |
+
"outputs": [],
|
29 |
+
"source": [
|
30 |
+
"!rm -r results/*"
|
31 |
+
]
|
32 |
+
},
|
33 |
+
{
|
34 |
+
"cell_type": "code",
|
35 |
+
"execution_count": 78,
|
36 |
+
"metadata": {},
|
37 |
+
"outputs": [
|
38 |
+
{
|
39 |
+
"name": "stdout",
|
40 |
+
"output_type": "stream",
|
41 |
+
"text": [
|
42 |
+
"RMT ['100,0', '100,0', '99,9', '100,0', '100,0', '99,6', '99,1', '96,4', '94,2', '76,4']\n",
|
43 |
+
"RMT-Retrieval ['100,0', '99,9', '99,8', '99,9', '99,9', '99,7', '99,5', '97,5', '97,4', '86,0']\n",
|
44 |
+
"GPT4 ['100,0', '97,0', '93,0', '66,0', '43,0', '30,0', '24,0', '', '', '']\n",
|
45 |
+
"GPT4 + RAG by sentences ['', '61,5', '59,0', '55,5', '55,5', '55,0', '55,5', '51,0', '51,0', '19,5']\n",
|
46 |
+
"GPT4 + Retrieve sentences (new 100 samples) ['', '63,0', '61,0', '60,0', '60,0', '56,0', '55,0', '55,0', '52,0', '28,0']\n",
|
47 |
+
"GPT4 + RAG by segments ['', '70,0', '58,0', '54,0', '42,0', '24,0', '16,0', '12,0', '12,0', '4,0']\n",
|
48 |
+
"GPT-3.5 ['', '88,0', '44,0', '24,0', '', '', '', '', '', '']\n",
|
49 |
+
"GPT-3.5 fine-tuned (trained on 100 samples) ['', '84,0', '72,0', '64,0', '', '', '', '', '', '']\n",
|
50 |
+
"GPT-3.5 fine-tuned (trained on 1000 samples) ['', '94,0', '96,0', '95,0', '', '', '', '', '', '']\n",
|
51 |
+
"ARMT ['', '99,9', '99,9', '99,9', '100,0', '100,0', '100,0', '99,9', '99,4', '97,4']\n",
|
52 |
+
"Mistral medium (xxB) ['', '73,0', '75,0', '58,0', '33,0', '', '', '', '', '']\n"
|
53 |
+
]
|
54 |
+
}
|
55 |
+
],
|
56 |
+
"source": [
|
57 |
+
"task_name = 'qa1'\n",
|
58 |
+
"qa1_results = '''RMT\t100,0\t100,0\t99,9\t100,0\t100,0\t99,6\t99,1\t96,4\t94,2\t76,4\n",
|
59 |
+
"RMT-Retrieval\t100,0\t99,9\t99,8\t99,9\t99,9\t99,7\t99,5\t97,5\t97,4\t86,0\n",
|
60 |
+
"GPT4\t100,0\t97,0\t93,0\t66,0\t43,0\t30,0\t24,0\t\t\t\n",
|
61 |
+
"GPT4 + RAG by sentences\t\t61,5\t59,0\t55,5\t55,5\t55,0\t55,5\t51,0\t51,0\t19,5\n",
|
62 |
+
"GPT4 + Retrieve sentences (new 100 samples)\t\t63,0\t61,0\t60,0\t60,0\t56,0\t55,0\t55,0\t52,0\t28,0\n",
|
63 |
+
"GPT4 + RAG by segments\t\t70,0\t58,0\t54,0\t42,0\t24,0\t16,0\t12,0\t12,0\t4,0\n",
|
64 |
+
"GPT-3.5\t\t88,0\t44,0\t24,0\t\t\t\t\t\t\n",
|
65 |
+
"GPT-3.5 fine-tuned (trained on 100 samples)\t\t84,0\t72,0\t64,0\t\t\t\t\t\t\n",
|
66 |
+
"GPT-3.5 fine-tuned (trained on 1000 samples)\t\t94,0\t96,0\t95,0\t\t\t\t\t\t\n",
|
67 |
+
"ARMT\t\t99,9\t99,9\t99,9\t100,0\t100,0\t100,0\t99,9\t99,4\t97,4\n",
|
68 |
+
"Mistral medium (xxB)\t\t73,0\t75,0\t58,0\t33,0\t\t\t\t\t'''\n",
|
69 |
+
"results = qa1_results.split('\\n')\n",
|
70 |
+
"for r in results:\n",
|
71 |
+
" model_name = r.split('\\t')[0]\n",
|
72 |
+
" numbers = r.split('\\t')[1:] \n",
|
73 |
+
" print(model_name, numbers)\n",
|
74 |
+
"\n",
|
75 |
+
" model_dir = os.path.join(out_path, model_name)\n",
|
76 |
+
" os.makedirs(model_dir, exist_ok=True)\n",
|
77 |
+
"\n",
|
78 |
+
" model_task_dir = os.path.join(model_dir, task_name)\n",
|
79 |
+
" os.makedirs(model_task_dir, exist_ok=True)\n",
|
80 |
+
"\n",
|
81 |
+
" for l, n in zip(lens, numbers):\n",
|
82 |
+
" len_file = os.path.join(model_task_dir, f'{l}.csv')\n",
|
83 |
+
" n = re.sub(',', '.', n)\n",
|
84 |
+
" try:\n",
|
85 |
+
" n = float(n) / 100\n",
|
86 |
+
" df = pd.DataFrame({\"result\": n}, index=[0])\n",
|
87 |
+
" df.to_csv(len_file, index=False)\n",
|
88 |
+
" except ValueError:\n",
|
89 |
+
" n = None\n",
|
90 |
+
" \n",
|
91 |
+
"\n",
|
92 |
+
"\n"
|
93 |
+
]
|
94 |
+
},
|
95 |
+
{
|
96 |
+
"cell_type": "code",
|
97 |
+
"execution_count": 79,
|
98 |
+
"metadata": {},
|
99 |
+
"outputs": [],
|
100 |
+
"source": [
|
101 |
+
"task_name = 'qa2'\n",
|
102 |
+
"qa2_results = '''RMT\t97,7\t98,9\t98,4\t96,1\t87,4\t72,7\t56,3\t32\t25,5\t16,2\n",
|
103 |
+
"RMT-Retrieval\t97,7\t98,0\t97,2\t93,4\t85,6\t71,6\t54,9\t31,8\t26,3\t13,0\n",
|
104 |
+
"GPT4\t84,0\t72,0\t60,0\t52,0\t24,0\t4,0\t8,0\t\t\t\n",
|
105 |
+
"ARMT\t\t99,8\t100,0\t100,0\t100,0\t100,0\t100,0\t99,7\t99,6\t81,7'''\n",
|
106 |
+
"results = qa2_results.split('\\n')\n",
|
107 |
+
"for r in results:\n",
|
108 |
+
" model_name = r.split('\\t')[0]\n",
|
109 |
+
" numbers = r.split('\\t')[1:] \n",
|
110 |
+
"\n",
|
111 |
+
" model_dir = os.path.join(out_path, model_name)\n",
|
112 |
+
" os.makedirs(model_dir, exist_ok=True)\n",
|
113 |
+
"\n",
|
114 |
+
" model_task_dir = os.path.join(model_dir, task_name)\n",
|
115 |
+
" os.makedirs(model_task_dir, exist_ok=True)\n",
|
116 |
+
"\n",
|
117 |
+
" for l, n in zip(lens, numbers):\n",
|
118 |
+
" len_file = os.path.join(model_task_dir, f'{l}.csv')\n",
|
119 |
+
" n = re.sub(',', '.', n)\n",
|
120 |
+
" try:\n",
|
121 |
+
" n = float(n) / 100\n",
|
122 |
+
" df = pd.DataFrame({\"result\": n}, index=[0])\n",
|
123 |
+
" df.to_csv(len_file, index=False)\n",
|
124 |
+
" except ValueError:\n",
|
125 |
+
" n = None\n",
|
126 |
+
" \n",
|
127 |
+
"\n",
|
128 |
+
"\n"
|
129 |
+
]
|
130 |
+
},
|
131 |
+
{
|
132 |
+
"cell_type": "code",
|
133 |
+
"execution_count": 80,
|
134 |
+
"metadata": {},
|
135 |
+
"outputs": [],
|
136 |
+
"source": [
|
137 |
+
"task_name = 'qa3'\n",
|
138 |
+
"qa3_results = '''RMT\t94,4\t83,6\t73,8\t70,2\t61,8\t51,9\t42,9\t25,9\t24,8\t21\n",
|
139 |
+
"RMT-Retrieval\t94,4\t83,8\t76,0\t72,0\t62,5\t52,9\t41,9\t25,5\t22,2\t16,4\n",
|
140 |
+
"GPT4\t56,0\t32,0\t24,0\t28,0\t28,0\t12,0\t4,0\t\t\t\n",
|
141 |
+
"ARMT\t\t90,9\t92,0\t92,7\t90,7\t88,3\t80,4\t67,9\t56,4\t27,5'''\n",
|
142 |
+
"results = qa3_results.split('\\n')\n",
|
143 |
+
"for r in results:\n",
|
144 |
+
" model_name = r.split('\\t')[0]\n",
|
145 |
+
" numbers = r.split('\\t')[1:] \n",
|
146 |
+
"\n",
|
147 |
+
" model_dir = os.path.join(out_path, model_name)\n",
|
148 |
+
" os.makedirs(model_dir, exist_ok=True)\n",
|
149 |
+
"\n",
|
150 |
+
" model_task_dir = os.path.join(model_dir, task_name)\n",
|
151 |
+
" os.makedirs(model_task_dir, exist_ok=True)\n",
|
152 |
+
"\n",
|
153 |
+
" for l, n in zip(lens, numbers):\n",
|
154 |
+
" len_file = os.path.join(model_task_dir, f'{l}.csv')\n",
|
155 |
+
" n = re.sub(',', '.', n)\n",
|
156 |
+
" try:\n",
|
157 |
+
" n = float(n) / 100\n",
|
158 |
+
" df = pd.DataFrame({\"result\": n}, index=[0])\n",
|
159 |
+
" df.to_csv(len_file, index=False)\n",
|
160 |
+
" except ValueError:\n",
|
161 |
+
" n = None\n",
|
162 |
+
" \n",
|
163 |
+
"\n",
|
164 |
+
"\n"
|
165 |
+
]
|
166 |
+
},
|
167 |
+
{
|
168 |
+
"cell_type": "code",
|
169 |
+
"execution_count": 81,
|
170 |
+
"metadata": {},
|
171 |
+
"outputs": [],
|
172 |
+
"source": [
|
173 |
+
"qa4_results = '''RMT\t99,8\t82,3\t81,9\t79,2\t70,5\t51,2\t40\t29,4\t27,3\t17,2\n",
|
174 |
+
"RMT-Retrieval\t99,8\t82,50\t79,70\t76,40\t72,20\t58,80\t50,10\t32,10\t26,00\t14,00\n",
|
175 |
+
"GPT4\t100,0\t72,0\t60,0\t72,0\t64,0\t20,0\t36,0\t\t\t\n",
|
176 |
+
"ARMT\t\t100,0\t100,0\t100,0\t100,0\t100,0\t100,0\t100,0\t99,8\t93,2'''\n",
|
177 |
+
"\n",
|
178 |
+
"task_name = 'qa4'\n",
|
179 |
+
"results = qa4_results.split('\\n')\n",
|
180 |
+
"for r in results:\n",
|
181 |
+
" model_name = r.split('\\t')[0]\n",
|
182 |
+
" numbers = r.split('\\t')[1:] \n",
|
183 |
+
"\n",
|
184 |
+
" model_dir = os.path.join(out_path, model_name)\n",
|
185 |
+
" os.makedirs(model_dir, exist_ok=True)\n",
|
186 |
+
"\n",
|
187 |
+
" model_task_dir = os.path.join(model_dir, task_name)\n",
|
188 |
+
" os.makedirs(model_task_dir, exist_ok=True)\n",
|
189 |
+
"\n",
|
190 |
+
" for l, n in zip(lens, numbers):\n",
|
191 |
+
" len_file = os.path.join(model_task_dir, f'{l}.csv')\n",
|
192 |
+
" n = re.sub(',', '.', n)\n",
|
193 |
+
" try:\n",
|
194 |
+
" n = float(n) / 100\n",
|
195 |
+
" df = pd.DataFrame({\"result\": n}, index=[0])\n",
|
196 |
+
" df.to_csv(len_file, index=False)\n",
|
197 |
+
" except ValueError:\n",
|
198 |
+
" n = None"
|
199 |
+
]
|
200 |
+
},
|
201 |
+
{
|
202 |
+
"cell_type": "code",
|
203 |
+
"execution_count": 82,
|
204 |
+
"metadata": {},
|
205 |
+
"outputs": [],
|
206 |
+
"source": [
|
207 |
+
"qa5_results = '''RMT\t98,4\t99,3\t99,1\t97,4\t95,5\t88,5\t78,1\t56,4\t48\t27,3\n",
|
208 |
+
"RMT-Retrieval\t98,4\t98,80\t98,90\t98,20\t93,60\t86,20\t77,40\t55,90\t49,90\t35,00\n",
|
209 |
+
"GPT4\t96,0\t100,0\t84,0\t68,0\t52,0\t64,0\t48,0\t\t\t\n",
|
210 |
+
"ARMT\t\t99,5\t99,3\t99,4\t98,9\t98,9\t98,8\t98,2\t97,8\t87,0'''\n",
|
211 |
+
"\n",
|
212 |
+
"task_name = 'qa5'\n",
|
213 |
+
"results = qa5_results.split('\\n')\n",
|
214 |
+
"for r in results:\n",
|
215 |
+
" model_name = r.split('\\t')[0]\n",
|
216 |
+
" numbers = r.split('\\t')[1:] \n",
|
217 |
+
"\n",
|
218 |
+
" model_dir = os.path.join(out_path, model_name)\n",
|
219 |
+
" os.makedirs(model_dir, exist_ok=True)\n",
|
220 |
+
"\n",
|
221 |
+
" model_task_dir = os.path.join(model_dir, task_name)\n",
|
222 |
+
" os.makedirs(model_task_dir, exist_ok=True)\n",
|
223 |
+
"\n",
|
224 |
+
" for l, n in zip(lens, numbers):\n",
|
225 |
+
" len_file = os.path.join(model_task_dir, f'{l}.csv')\n",
|
226 |
+
" n = re.sub(',', '.', n)\n",
|
227 |
+
" try:\n",
|
228 |
+
" n = float(n) / 100\n",
|
229 |
+
" df = pd.DataFrame({\"result\": n}, index=[0])\n",
|
230 |
+
" df.to_csv(len_file, index=False)\n",
|
231 |
+
" except ValueError:\n",
|
232 |
+
" n = None"
|
233 |
+
]
|
234 |
+
},
|
235 |
+
{
|
236 |
+
"cell_type": "code",
|
237 |
+
"execution_count": null,
|
238 |
+
"metadata": {},
|
239 |
+
"outputs": [],
|
240 |
+
"source": []
|
241 |
+
}
|
242 |
+
],
|
243 |
+
"metadata": {
|
244 |
+
"kernelspec": {
|
245 |
+
"display_name": "Python 3",
|
246 |
+
"language": "python",
|
247 |
+
"name": "python3"
|
248 |
+
},
|
249 |
+
"language_info": {
|
250 |
+
"codemirror_mode": {
|
251 |
+
"name": "ipython",
|
252 |
+
"version": 3
|
253 |
+
},
|
254 |
+
"file_extension": ".py",
|
255 |
+
"mimetype": "text/x-python",
|
256 |
+
"name": "python",
|
257 |
+
"nbconvert_exporter": "python",
|
258 |
+
"pygments_lexer": "ipython3",
|
259 |
+
"version": "3.11.3"
|
260 |
+
}
|
261 |
+
},
|
262 |
+
"nbformat": 4,
|
263 |
+
"nbformat_minor": 2
|
264 |
+
}
|
results/ARMT/qa1/1000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.9940000000000001
|
results/ARMT/qa1/10000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.9740000000000001
|
results/ARMT/qa1/128000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa1/16000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.9990000000000001
|
results/ARMT/qa1/32000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa1/4000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.9990000000000001
|
results/ARMT/qa1/500000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.9990000000000001
|
results/ARMT/qa1/64000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa1/8000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.9990000000000001
|
results/ARMT/qa2/1000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.996
|
results/ARMT/qa2/10000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.8170000000000001
|
results/ARMT/qa2/128000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa2/16000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa2/32000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa2/4000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.998
|
results/ARMT/qa2/500000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.997
|
results/ARMT/qa2/64000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa2/8000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa3/1000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.564
|
results/ARMT/qa3/10000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.275
|
results/ARMT/qa3/128000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.804
|
results/ARMT/qa3/16000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.927
|
results/ARMT/qa3/32000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.907
|
results/ARMT/qa3/4000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.909
|
results/ARMT/qa3/500000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.679
|
results/ARMT/qa3/64000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.883
|
results/ARMT/qa3/8000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.92
|
results/ARMT/qa4/1000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.998
|
results/ARMT/qa4/10000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.932
|
results/ARMT/qa4/128000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa4/16000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa4/32000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa4/4000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa4/500000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa4/64000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa4/8000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
1.0
|
results/ARMT/qa5/1000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.978
|
results/ARMT/qa5/10000000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.87
|
results/ARMT/qa5/128000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.988
|
results/ARMT/qa5/16000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.9940000000000001
|
results/ARMT/qa5/32000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.9890000000000001
|
results/ARMT/qa5/4000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.995
|
results/ARMT/qa5/500000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.982
|
results/ARMT/qa5/64000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.9890000000000001
|
results/ARMT/qa5/8000.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
result
|
2 |
+
0.993
|