Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

DontPlanToEnd commited on Sep 19

Commit

0194fde

•

1 Parent(s): 1281b3e

Update app.py

Browse files

Files changed (1) hide show

app.py +6 -2

app.py CHANGED Viewed

@@ -62,7 +62,7 @@ custom_css = """
 """
 # Define the columns for the different leaderboards
-UGI_COLS = ['#P', 'Model', 'UGI 🏆', 'W/10 👍', 'Unruly', 'Internet', 'Stats', 'Writing', 'PolContro']
 WRITING_STYLE_COLS = ['#P', 'Model', 'Reg+MyScore 🏆', 'Reg+Int 🏆', 'MyScore 🏆', 'ASSS⬇️', 'SMOG⬆️', 'Yule⬇️']
 ANIME_RATING_COLS = ['#P', 'Model', 'Score 🏆', 'Dif', 'Cor', 'Std']
 ADDITIONAL_COLS = ['Release Date', 'Date Added', 'Active Params', 'Total Params']
@@ -97,9 +97,11 @@ def load_leaderboard_data(csv_file_path):
         numeric_columns = df.select_dtypes(include=[np.number]).columns
         df[numeric_columns] = df[numeric_columns].round(3)
-        # Round the W/10 column to 1 decimal place
         if 'W/10 👍' in df.columns:
             df['W/10 👍'] = df['W/10 👍'].round(1)
         return df
     except Exception as e:
@@ -217,6 +219,8 @@ with GraInter:
             <strong>UGI:</strong> Uncensored General Intelligence. A measurement of the amount of uncensored/controversial information an LLM knows and is willing to tell the user. It is calculated from the average score of 5 subjects LLMs commonly refuse to talk about. The leaderboard is made of roughly 65 questions/tasks, measuring both willingness to answer and accuracy in fact-based controversial questions. I'm choosing to keep the questions private so people can't train on them and devalue the leaderboard.
             **W/10:** Willingness/10. A more narrow, 10-point score, measuring how far the model can be pushed before going against its instructions, refusing to answer, or adding an ethical disclaimer to its response.
             <br><br>
             A high UGI but low W/10 could mean for example that the model can provide a lot of accurate sensitive information, but will refuse to form the information into something it sees as dangerous. Or that it answers questions correctly, but appends a paragraph to its answer explaining why the question is immoral to ask.
             <br><br>

 """
 # Define the columns for the different leaderboards
+UGI_COLS = ['#P', 'Model', 'UGI 🏆', 'W/10 👍', 'I/10 💡', 'Unruly', 'Internet', 'Stats', 'Writing', 'PolContro']
 WRITING_STYLE_COLS = ['#P', 'Model', 'Reg+MyScore 🏆', 'Reg+Int 🏆', 'MyScore 🏆', 'ASSS⬇️', 'SMOG⬆️', 'Yule⬇️']
 ANIME_RATING_COLS = ['#P', 'Model', 'Score 🏆', 'Dif', 'Cor', 'Std']
 ADDITIONAL_COLS = ['Release Date', 'Date Added', 'Active Params', 'Total Params']
         numeric_columns = df.select_dtypes(include=[np.number]).columns
         df[numeric_columns] = df[numeric_columns].round(3)
+        # Round the W/10 column to 1 decimal place and I/10 to 2 decimal places
         if 'W/10 👍' in df.columns:
             df['W/10 👍'] = df['W/10 👍'].round(1)
+        if 'I/10 💡' in df.columns:
+            df['I/10 💡'] = df['I/10 💡'].round(2)
         return df
     except Exception as e:
             <strong>UGI:</strong> Uncensored General Intelligence. A measurement of the amount of uncensored/controversial information an LLM knows and is willing to tell the user. It is calculated from the average score of 5 subjects LLMs commonly refuse to talk about. The leaderboard is made of roughly 65 questions/tasks, measuring both willingness to answer and accuracy in fact-based controversial questions. I'm choosing to keep the questions private so people can't train on them and devalue the leaderboard.
             **W/10:** Willingness/10. A more narrow, 10-point score, measuring how far the model can be pushed before going against its instructions, refusing to answer, or adding an ethical disclaimer to its response.
+            <br>
+            **I/10:** Intelligence/10. A 10-point score made up of the UGI questions with the highest correlation with parameter size. This shows how much a model's knowledge and reasoning play a role in its UGI score.
             <br><br>
             A high UGI but low W/10 could mean for example that the model can provide a lot of accurate sensitive information, but will refuse to form the information into something it sees as dangerous. Or that it answers questions correctly, but appends a paragraph to its answer explaining why the question is immoral to ask.
             <br><br>