grg commited on
Commit
1d3b79e
1 Parent(s): abb889e

update phrasing

Browse files
templates/about.html CHANGED
@@ -252,9 +252,9 @@
252
  Here are the considered context chunks:
253
  </p>
254
  <ul>
255
- <li> <b> no_conv </b>: no conversation is simulated the questions from the PVQ-40 questionnaire are given directly </li>
256
- <li> <b> no_conv_svs </b>: no conversation is simulated the questions from the SVS questionnaire are given directly </li>
257
- <li> <b> chunk_0-chunk-4 </b>: <a target="_blank" href="https://gitlab.inria.fr/gkovac/value_stability/-/tree/master/contexts/leaderboard_reddit_chunks?ref_type=heads">50 reddit posts</a> used as the initial Interlocutor model messages (one per persona). chunk_0 contains the longest posts, chunk_4 the shortest. </li>
258
  <li> <b> chess </b>: "1. e4" is given as the initial message to all personas, but for each persona the Interlocutor model is instructed to simulate a different persona (instead of a human user) </li>
259
  <li> <b> grammar </b>: like chess, but "Can you check this sentence for grammar? \n Whilst Jane was waiting to meet hers friend their nose started bleeding." is given as the initial message.
260
  </ul>
 
252
  Here are the considered context chunks:
253
  </p>
254
  <ul>
255
+ <li> <b> no_conv </b>: no conversation is simulated and the questions from the PVQ-40 questionnaire are given directly </li>
256
+ <li> <b> no_conv_svs </b>: no conversation is simulated and the questions from the SVS questionnaire are given directly </li>
257
+ <li> <b> chunk_0-chunk-4 </b>: each chunk has 50 reddit posts, which are used as the initial Interlocutor model messages (one per persona). chunk_0 contains the longest posts, chunk_4 the shortest. </li>
258
  <li> <b> chess </b>: "1. e4" is given as the initial message to all personas, but for each persona the Interlocutor model is instructed to simulate a different persona (instead of a human user) </li>
259
  <li> <b> grammar </b>: like chess, but "Can you check this sentence for grammar? \n Whilst Jane was waiting to meet hers friend their nose started bleeding." is given as the initial message.
260
  </ul>
templates/index.html CHANGED
@@ -321,14 +321,14 @@
321
  <li><b>Ordinal - Win Rate</b> -
322
  <i>Which model beats the most other models across most metrics?</i>
323
  <div style="margin-left: 20px; margin-top: 5px">
324
- The score averaged over all metrics (with descending metrics inverted), context pairs (for stability) and contexts (for validity metrics)
325
- <div>
326
  </li>
327
  <li><b>Cardinal - Score</b> -
328
  <i>Which model has the highest average score?</i>
329
  <div style="margin-left: 20px; margin-top: 5px">
330
- The percentage of won games, where a game is a comparison of each model pair, each metric, and each context pair (for stability) or context (for validity metrics)
331
- </div>
332
  </li>
333
  </ul>
334
  </p>
 
321
  <li><b>Ordinal - Win Rate</b> -
322
  <i>Which model beats the most other models across most metrics?</i>
323
  <div style="margin-left: 20px; margin-top: 5px">
324
+ The percentage of won games, where a game is a comparison of each model pair, each metric, and each context pair (for stability) or context (for validity metrics)
325
+ </div>
326
  </li>
327
  <li><b>Cardinal - Score</b> -
328
  <i>Which model has the highest average score?</i>
329
  <div style="margin-left: 20px; margin-top: 5px">
330
+ The score averaged over all metrics (with descending metrics inverted), context pairs (for stability) and contexts (for validity metrics)
331
+ <div>
332
  </li>
333
  </ul>
334
  </p>
templates/model_detail.html CHANGED
@@ -267,7 +267,9 @@
267
  <h2>Visualizing the order of simulated personas</h2>
268
  <p>
269
  This image shows the order of personas in each context chunk for each value.
 
270
  For each value (row), the personas are ordered on the x-axis by their expression of this value in the `no_conv` setting (gray).
 
271
  Therefore, the Rank-Order stability between the `no_conv` chunk and some chunk corresponds to the extent to which the curve is increasing in that chunk.
272
  </p>
273
  <div class="image-container">
 
267
  <h2>Visualizing the order of simulated personas</h2>
268
  <p>
269
  This image shows the order of personas in each context chunk for each value.
270
+ A chunk refers to the set of text (e.g. reddit posts) that are used to start conversations with different characters.
271
  For each value (row), the personas are ordered on the x-axis by their expression of this value in the `no_conv` setting (gray).
272
+ In this setting no conversation is simulated and values are scored with PVQ.
273
  Therefore, the Rank-Order stability between the `no_conv` chunk and some chunk corresponds to the extent to which the curve is increasing in that chunk.
274
  </p>
275
  <div class="image-container">