DontPlanToEnd
commited on
Commit
β’
0194fde
1
Parent(s):
1281b3e
Update app.py
Browse files
app.py
CHANGED
@@ -62,7 +62,7 @@ custom_css = """
|
|
62 |
"""
|
63 |
|
64 |
# Define the columns for the different leaderboards
|
65 |
-
UGI_COLS = ['#P', 'Model', 'UGI π', 'W/10 π', 'Unruly', 'Internet', 'Stats', 'Writing', 'PolContro']
|
66 |
WRITING_STYLE_COLS = ['#P', 'Model', 'Reg+MyScore π', 'Reg+Int π', 'MyScore π', 'ASSSβ¬οΈ', 'SMOGβ¬οΈ', 'Yuleβ¬οΈ']
|
67 |
ANIME_RATING_COLS = ['#P', 'Model', 'Score π', 'Dif', 'Cor', 'Std']
|
68 |
ADDITIONAL_COLS = ['Release Date', 'Date Added', 'Active Params', 'Total Params']
|
@@ -97,9 +97,11 @@ def load_leaderboard_data(csv_file_path):
|
|
97 |
numeric_columns = df.select_dtypes(include=[np.number]).columns
|
98 |
df[numeric_columns] = df[numeric_columns].round(3)
|
99 |
|
100 |
-
# Round the W/10 column to 1 decimal place
|
101 |
if 'W/10 π' in df.columns:
|
102 |
df['W/10 π'] = df['W/10 π'].round(1)
|
|
|
|
|
103 |
|
104 |
return df
|
105 |
except Exception as e:
|
@@ -217,6 +219,8 @@ with GraInter:
|
|
217 |
<strong>UGI:</strong> Uncensored General Intelligence. A measurement of the amount of uncensored/controversial information an LLM knows and is willing to tell the user. It is calculated from the average score of 5 subjects LLMs commonly refuse to talk about. The leaderboard is made of roughly 65 questions/tasks, measuring both willingness to answer and accuracy in fact-based controversial questions. I'm choosing to keep the questions private so people can't train on them and devalue the leaderboard.
|
218 |
|
219 |
**W/10:** Willingness/10. A more narrow, 10-point score, measuring how far the model can be pushed before going against its instructions, refusing to answer, or adding an ethical disclaimer to its response.
|
|
|
|
|
220 |
<br><br>
|
221 |
A high UGI but low W/10 could mean for example that the model can provide a lot of accurate sensitive information, but will refuse to form the information into something it sees as dangerous. Or that it answers questions correctly, but appends a paragraph to its answer explaining why the question is immoral to ask.
|
222 |
<br><br>
|
|
|
62 |
"""
|
63 |
|
64 |
# Define the columns for the different leaderboards
|
65 |
+
UGI_COLS = ['#P', 'Model', 'UGI π', 'W/10 π', 'I/10 π‘', 'Unruly', 'Internet', 'Stats', 'Writing', 'PolContro']
|
66 |
WRITING_STYLE_COLS = ['#P', 'Model', 'Reg+MyScore π', 'Reg+Int π', 'MyScore π', 'ASSSβ¬οΈ', 'SMOGβ¬οΈ', 'Yuleβ¬οΈ']
|
67 |
ANIME_RATING_COLS = ['#P', 'Model', 'Score π', 'Dif', 'Cor', 'Std']
|
68 |
ADDITIONAL_COLS = ['Release Date', 'Date Added', 'Active Params', 'Total Params']
|
|
|
97 |
numeric_columns = df.select_dtypes(include=[np.number]).columns
|
98 |
df[numeric_columns] = df[numeric_columns].round(3)
|
99 |
|
100 |
+
# Round the W/10 column to 1 decimal place and I/10 to 2 decimal places
|
101 |
if 'W/10 π' in df.columns:
|
102 |
df['W/10 π'] = df['W/10 π'].round(1)
|
103 |
+
if 'I/10 π‘' in df.columns:
|
104 |
+
df['I/10 π‘'] = df['I/10 π‘'].round(2)
|
105 |
|
106 |
return df
|
107 |
except Exception as e:
|
|
|
219 |
<strong>UGI:</strong> Uncensored General Intelligence. A measurement of the amount of uncensored/controversial information an LLM knows and is willing to tell the user. It is calculated from the average score of 5 subjects LLMs commonly refuse to talk about. The leaderboard is made of roughly 65 questions/tasks, measuring both willingness to answer and accuracy in fact-based controversial questions. I'm choosing to keep the questions private so people can't train on them and devalue the leaderboard.
|
220 |
|
221 |
**W/10:** Willingness/10. A more narrow, 10-point score, measuring how far the model can be pushed before going against its instructions, refusing to answer, or adding an ethical disclaimer to its response.
|
222 |
+
<br>
|
223 |
+
**I/10:** Intelligence/10. A 10-point score made up of the UGI questions with the highest correlation with parameter size. This shows how much a model's knowledge and reasoning play a role in its UGI score.
|
224 |
<br><br>
|
225 |
A high UGI but low W/10 could mean for example that the model can provide a lot of accurate sensitive information, but will refuse to form the information into something it sees as dangerous. Or that it answers questions correctly, but appends a paragraph to its answer explaining why the question is immoral to ask.
|
226 |
<br><br>
|