DontPlanToEnd commited on
Commit
0194fde
β€’
1 Parent(s): 1281b3e

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +6 -2
app.py CHANGED
@@ -62,7 +62,7 @@ custom_css = """
62
  """
63
 
64
  # Define the columns for the different leaderboards
65
- UGI_COLS = ['#P', 'Model', 'UGI πŸ†', 'W/10 πŸ‘', 'Unruly', 'Internet', 'Stats', 'Writing', 'PolContro']
66
  WRITING_STYLE_COLS = ['#P', 'Model', 'Reg+MyScore πŸ†', 'Reg+Int πŸ†', 'MyScore πŸ†', 'ASSS⬇️', 'SMOG⬆️', 'Yule⬇️']
67
  ANIME_RATING_COLS = ['#P', 'Model', 'Score πŸ†', 'Dif', 'Cor', 'Std']
68
  ADDITIONAL_COLS = ['Release Date', 'Date Added', 'Active Params', 'Total Params']
@@ -97,9 +97,11 @@ def load_leaderboard_data(csv_file_path):
97
  numeric_columns = df.select_dtypes(include=[np.number]).columns
98
  df[numeric_columns] = df[numeric_columns].round(3)
99
 
100
- # Round the W/10 column to 1 decimal place
101
  if 'W/10 πŸ‘' in df.columns:
102
  df['W/10 πŸ‘'] = df['W/10 πŸ‘'].round(1)
 
 
103
 
104
  return df
105
  except Exception as e:
@@ -217,6 +219,8 @@ with GraInter:
217
  <strong>UGI:</strong> Uncensored General Intelligence. A measurement of the amount of uncensored/controversial information an LLM knows and is willing to tell the user. It is calculated from the average score of 5 subjects LLMs commonly refuse to talk about. The leaderboard is made of roughly 65 questions/tasks, measuring both willingness to answer and accuracy in fact-based controversial questions. I'm choosing to keep the questions private so people can't train on them and devalue the leaderboard.
218
 
219
  **W/10:** Willingness/10. A more narrow, 10-point score, measuring how far the model can be pushed before going against its instructions, refusing to answer, or adding an ethical disclaimer to its response.
 
 
220
  <br><br>
221
  A high UGI but low W/10 could mean for example that the model can provide a lot of accurate sensitive information, but will refuse to form the information into something it sees as dangerous. Or that it answers questions correctly, but appends a paragraph to its answer explaining why the question is immoral to ask.
222
  <br><br>
 
62
  """
63
 
64
  # Define the columns for the different leaderboards
65
+ UGI_COLS = ['#P', 'Model', 'UGI πŸ†', 'W/10 πŸ‘', 'I/10 πŸ’‘', 'Unruly', 'Internet', 'Stats', 'Writing', 'PolContro']
66
  WRITING_STYLE_COLS = ['#P', 'Model', 'Reg+MyScore πŸ†', 'Reg+Int πŸ†', 'MyScore πŸ†', 'ASSS⬇️', 'SMOG⬆️', 'Yule⬇️']
67
  ANIME_RATING_COLS = ['#P', 'Model', 'Score πŸ†', 'Dif', 'Cor', 'Std']
68
  ADDITIONAL_COLS = ['Release Date', 'Date Added', 'Active Params', 'Total Params']
 
97
  numeric_columns = df.select_dtypes(include=[np.number]).columns
98
  df[numeric_columns] = df[numeric_columns].round(3)
99
 
100
+ # Round the W/10 column to 1 decimal place and I/10 to 2 decimal places
101
  if 'W/10 πŸ‘' in df.columns:
102
  df['W/10 πŸ‘'] = df['W/10 πŸ‘'].round(1)
103
+ if 'I/10 πŸ’‘' in df.columns:
104
+ df['I/10 πŸ’‘'] = df['I/10 πŸ’‘'].round(2)
105
 
106
  return df
107
  except Exception as e:
 
219
  <strong>UGI:</strong> Uncensored General Intelligence. A measurement of the amount of uncensored/controversial information an LLM knows and is willing to tell the user. It is calculated from the average score of 5 subjects LLMs commonly refuse to talk about. The leaderboard is made of roughly 65 questions/tasks, measuring both willingness to answer and accuracy in fact-based controversial questions. I'm choosing to keep the questions private so people can't train on them and devalue the leaderboard.
220
 
221
  **W/10:** Willingness/10. A more narrow, 10-point score, measuring how far the model can be pushed before going against its instructions, refusing to answer, or adding an ethical disclaimer to its response.
222
+ <br>
223
+ **I/10:** Intelligence/10. A 10-point score made up of the UGI questions with the highest correlation with parameter size. This shows how much a model's knowledge and reasoning play a role in its UGI score.
224
  <br><br>
225
  A high UGI but low W/10 could mean for example that the model can provide a lot of accurate sensitive information, but will refuse to form the information into something it sees as dangerous. Or that it answers questions correctly, but appends a paragraph to its answer explaining why the question is immoral to ask.
226
  <br><br>