|
title = "<h1 class='demo-title'>🐑 Plausibility Evaluation of Context Reliance (PECoRe) 🐑</h1>" |
|
|
|
subtitle = "<h2 class='demo-subtitle'>An Interpretability Framework to Detect and Attribute Context Reliance in Language Models</h2>" |
|
|
|
description = """ |
|
PECoRe is a framework for trustworthy language generation using only model internals to detect and attribute model |
|
generations to its available input context. Given a query-context input pair, PECoRe identifies which tokens in the generated |
|
response were more dependant on context (<span class="category-label" style="background-color:#5fb77d; color: black; font-weight: var(--weight-semibold)">Context sensitive </span>), and match them with context tokens contributing the most to their prediction (<span class="category-label" style="background-color:#80ace8; color: black; font-weight: var(--weight-semibold)">Influential context </span>). |
|
|
|
Check out <a href="https://openreview.net/forum?id=XTHfNGI3zT" target='_blank'>our ICLR 2024 paper</a> for more details. A new paper applying PECoRe to retrieval-augmented QA is forthcoming ✨ stay tuned! |
|
""" |
|
|
|
how_it_works_intro = """ |
|
The PECoRe (Plausibility Evaluation of Context Reliance) framework is designed to <b>detect and quantify context usage</b> throughout language model generations. Its final goal is to return <b>one or more pairs</b> representing tokens in the generated response that were influenced by the presence of context (<span class="category-label" style="background-color:#5fb77d; color: black; font-weight: var(--weight-semibold)">Context sensitive </span>), and their corresponding influential context tokens (<span class="category-label" style="background-color:#80ace8; color: black; font-weight: var(--weight-semibold)">Influential context </span>). |
|
|
|
The PECoRe procedure involves two contrastive comparison steps: |
|
""" |
|
|
|
cti_explanation = """ |
|
<h3>1. Context-sensitive Token Identification (CTI)</h3> |
|
<p>In this step, the goal is to identify which tokens in the generated text were influenced by the preceding context.</p> |
|
<p>First, a context-aware generation is produced using the model's inputs augmented with available context. Then, the same generation is force-decoded using the contextless inputs. During both processes, a <b>contrastive metric</b> (KL-divergence is used as default for the <code>Context sensitivity metric</code> parameter) are collected for every generated token. Intuitively, higher metric scores indicate that the current generation step was more influenced by the presence of context.</p> |
|
<p>The generated tokens are ranked according to their metric scores, and the most salient tokens are selected for the next step (This demo provides a <code>Context sensitivity threshold</code> parameter to select tokens above <code>N</code> standard deviations from the in-example metric average, and <code>Context sensitivity top-k</code> to pick the K most salient tokens.)</p> |
|
<p>In the example shown in the figure, <code>elle</code> is selected as the only context-sensitive token by the procedure.</p> |
|
""" |
|
|
|
cci_explanation = """ |
|
<h3>2. Contextual Cue Imputation (CCI)</h3> |
|
<p>Once context-sensitive tokens are identified, the next step is to link every one of these tokens to specific contextual cues that justified its prediction.</p> |
|
<p>This is achieved by means of <b>contrastive feature attribution</b> (<a href="https://aclanthology.org/2022.emnlp-main.14/" target="_blank">Yin and Neubig, 2022</a>). More specifically, for a given context-sensitive token, a contrastive alternative to it is generated in absence of input context, and a function of the probabilities of the pair is used to identify salient parts of the context (By default, in this demo we use <code>saliency</code>, i.e. raw gradients, for the <code>Attribution method</code> and <code>contrast_prob_diff</code>, i.e. the probability difference between the two options, for the <code>Attributed function</code>).</p> |
|
<p>Gradients are collected and aggregated to obtain a single score per context token, which is then used to rank the tokens and select the most influential ones (This demo provides a <code>Attribution threshold</code> parameter to select tokens above <code>N</code> standard deviations from the in-example metric average, and <code>Attribution top-k</code> to pick the K most salient tokens.)</p> |
|
<p>In the example shown in the figure, the attribution process links <code>elle</code> to <code>dishes</code> and <code>assiettes</code> in the source and target contexts, respectively. This makes sense intuitively, as <code>they</code> in the original input is gender-neutral in English, and the presence of its gendered coreferent disambiguates the choice for the French pronoun in the translation.</p> |
|
""" |
|
|
|
how_to_use = """ |
|
<h2>How to use this demo</h3> |
|
|
|
<p>This demo provides a convenient UI for the Inseq implementation of PECoRe (the <a href="https://inseq.org/en/latest/main_classes/cli.html#attribute-context"><code>inseq attribute-context</code></a> CLI command).</p> |
|
<p>In the demo tab, fill in the input and context fields with the text you want to analyze, and click the <code>Run PECoRe</code> button to produce an output where the tokens selected by PECoRe in the model generation and context are highlighted. For more details on the parameters and their meaning, check the <code>Parameters</code> tab.</p> |
|
|
|
<h2>Interpreting PECoRe results</h3> |
|
""" |
|
|
|
example_explanation = """ |
|
<p>Consider the following example, showing inputs and outputs of the <a href='https://huggingface.co/gsarti/cora_mgen' target='_blank'>CORA Multilingual QA</a> model provided as default in the interface, using default settings.</p> |
|
<img src="file/img/pecore_ui_output_example.png" width=100% /> |
|
<p>The PECoRe CTI step identified two context-sensitive tokens in the generation (<code>287</code> and <code>,</code>), while the CCI step associated each of those with the most influential tokens in the context. It can be observed that in both cases similar tokens from the passage stating the number of inhabitants are identified as salient (<code>235</code> and <code>,</code> for the generated <code>287</code>, while <code>had</code> is also found salient for the generated <code>,</code>).</p> |
|
<h2>Usage tips</h3> |
|
<ol> |
|
<li>The <code>📂 Download output</code> button allows you to download the full JSON output produced by the Inseq CLI. It includes, among other things, the full set of CTI and CCI scores produced by PECoRe, tokenized versions of the input context and generated output and the full arguments used for the CLI call.</li> |
|
<li>The <code>🔍 Download HTML</code> button allows you to download an HTML view of the output similar to the one visualized in the demo. |
|
<li>By default, all generated tokens <b>above the mean CTI score</b> for the generated text are highlighted as context-sensitive. This might be reasonable for short answers, but the threshold can be raised by increasing the <code>Context sensitivity threshold</code> parameter to ensure only very sensitive tokens are picked up in longer replies.</li> |
|
<li>Relatedly, all context tokens receiving <b>CCI scores >2 standard deviations</b> above the context mean are highlighted as influential. This might be reasonable for contexts with at least 50-100 tokens, but the threshold can be lowered by decreasing the <code>Attribution threshold</code> parameter to be more lenient in the selection for shorter contexts.</li> |
|
<li>When using a model, make sure that the <b>contextual and contextless templates are set to match the expected format</b>. You can use presets to auto-fill these for the provided models.</li> |
|
<li>If you are using an encoder-decoder expecting an output context (e.g. the multilingual MT preset), the <b>output context should be provided manually</b> before running PECoRe in the <code>Generation context</code> parameter. This is a requirement for the demo because the splitting between output context and current cannot be reliably performed in an automatic way. However, the <code>inseq attribute-context</code> CLI command actually support various strategies, including prompting users for a split and/or trying an automatic source-target alignment. </li> |
|
</ol> |
|
<h2>Using PECoRe from Python with Inseq</h3> |
|
<p>This demo is useful for testing out various models and methods for PECoRe attribution, but the <a href="https://inseq.org/en/latest/main_classes/cli.html#attribute-context"><code>inseq attribute-context</code></a> CLI command is the way to go if you want to run experiments on several examples, or if you want to exploit the full customizability of the Inseq API.</p> |
|
<p>The utility we provide in this section allows you to generate Python and Shell code calling the Inseq CLI with the parameters you set in the interface. <b>We recommend using the Python version for repeated evaluation, since it allows for model-preloading.</b></p> |
|
<p>Once you are satisfied with the parameters you set (including context/query strings in the <code>🐑 Demo</code> tab), just press the button and get your code snippets ready for usage! 🤗</p> |
|
""" |
|
|
|
citation = r""" |
|
<p>To refer to the PECoRe framework for context usage detection, cite:</p> |
|
<div class="code_wrap"><button class="copy_code_button" title="copy"> |
|
<span class="copy-text"><svg viewBox="0 0 32 32" height="100%" width="100%" xmlns="http://www.w3.org/2000/svg"><path d="M28 10v18H10V10h18m0-2H10a2 2 0 0 0-2 2v18a2 2 0 0 0 2 2h18a2 2 0 0 0 2-2V10a2 2 0 0 0-2-2Z" fill="currentColor"></path><path d="M4 18H2V4a2 2 0 0 1 2-2h14v2H4Z" fill="currentColor"></path></svg></span> |
|
<span class="check"><svg stroke-linejoin="round" stroke-linecap="round" stroke-width="3" stroke="currentColor" fill="none" viewBox="0 0 24 24" height="100%" width="100%" xmlns="http://www.w3.org/2000/svg"><polyline points="20 6 9 17 4 12"></polyline></svg></span> |
|
</button><pre><code> |
|
@inproceedings{sarti-etal-2023-quantifying, |
|
title = "Quantifying the Plausibility of Context Reliance in Neural Machine Translation", |
|
author = "Sarti, Gabriele and |
|
Chrupa{\l}a, Grzegorz and |
|
Nissim, Malvina and |
|
Bisazza, Arianna", |
|
booktitle = "The Twelfth International Conference on Learning Representations (ICLR 2024)", |
|
month = may, |
|
year = "2024", |
|
address = "Vienna, Austria", |
|
publisher = "OpenReview", |
|
url = "https://openreview.net/forum?id=XTHfNGI3zT" |
|
} |
|
</code></pre></div> |
|
|
|
If you use the Inseq implementation of PECoRe (<a href="https://inseq.org/en/latest/main_classes/cli.html#attribute-context"><code>inseq attribute-context</code></a>, including this demo), please also cite: |
|
<div class="code_wrap"><button class="copy_code_button" title="copy"> |
|
<span class="copy-text"><svg viewBox="0 0 32 32" height="100%" width="100%" xmlns="http://www.w3.org/2000/svg"><path d="M28 10v18H10V10h18m0-2H10a2 2 0 0 0-2 2v18a2 2 0 0 0 2 2h18a2 2 0 0 0 2-2V10a2 2 0 0 0-2-2Z" fill="currentColor"></path><path d="M4 18H2V4a2 2 0 0 1 2-2h14v2H4Z" fill="currentColor"></path></svg></span> |
|
<span class="check"><svg stroke-linejoin="round" stroke-linecap="round" stroke-width="3" stroke="currentColor" fill="none" viewBox="0 0 24 24" height="100%" width="100%" xmlns="http://www.w3.org/2000/svg"><polyline points="20 6 9 17 4 12"></polyline></svg></span> |
|
</button><pre><code> |
|
@inproceedings{sarti-etal-2023-inseq, |
|
title = "Inseq: An Interpretability Toolkit for Sequence Generation Models", |
|
author = "Sarti, Gabriele and |
|
Feldhus, Nils and |
|
Sickert, Ludwig and |
|
van der Wal, Oskar and |
|
Nissim, Malvina and |
|
Bisazza, Arianna", |
|
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)", |
|
month = jul, |
|
year = "2023", |
|
address = "Toronto, Canada", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2023.acl-demo.40", |
|
pages = "421--435", |
|
} |
|
</code></pre></div> |
|
""" |
|
|
|
powered_by = """<div class="footer-custom-block"><b>Powered by</b> <a href='https://github.com/inseq-team/inseq' target='_blank'><img src="file/img/inseq_logo_white_contour.png" width=150px /></a></div>""" |
|
|
|
support = """<div class="footer-custom-block"><b>Built by <a href="https://gsarti.com" target="_blank">Gabriele Sarti</a> with the support of</b> <a href='https://www.rug.nl/research/clcg/research/cl/' target='_blank'><img src="file/img/rug_logo_white_contour.png" width=170px /></a><a href='https://projects.illc.uva.nl/indeep/' target='_blank'><img src="file/img/indeep_logo_white_contour.png" width=100px /></a><a href='https://www.esciencecenter.nl/' target='_blank'><img src="file/img/escience_logo_white_contour.png" width=120px /></a></div>""" |
|
|
|
examples = [ |
|
[ |
|
"How many inhabitants does Groningen have?", |
|
"Groningen is the capital city and main municipality of Groningen province in the Netherlands. The capital of the north, Groningen is the largest place as well as the economic and cultural centre of the northern part of the country as of December 2021, it had 235,287 inhabitants, making it the sixth largest city/municipality in the Netherlands and the second largest outside the Randstad. Groningen was established more than 950 years ago and gained city rights in 1245." |
|
], |
|
[ |
|
"When was Banff National Park established?", |
|
"Banff National Park is Canada's oldest national park, established in 1885 as Rocky Mountains Park. Located in Alberta's Rocky Mountains, 110-180 kilometres (68-112 mi) west of Calgary, Banff encompasses 6,641 square kilometres (2,564 sq mi) of mountainous terrain.", |
|
], |
|
[ |
|
"约翰·埃尔维目前在野马队中担任什么角色?", |
|
"培顿·曼宁成为史上首位带领两支不同球队多次进入超级碗的四分卫。他也以 39 岁高龄参加超级碗而成为史上年龄最大的四分卫。过去的记录是由约翰·埃尔维保持的,他在 38岁时带领野马队赢得第 33 届超级碗,目前担任丹佛的橄榄球运营执行副总裁兼总经理。", |
|
], |
|
[ |
|
"Qual'è il porto più settentrionale della Slovenia?", |
|
"Trieste si trova a nordest dell'Italia. La città dista solo alcuni chilometri dal confine con la Slovenia e si trova fra la penisola italiana e la penisola istriana. Il porto triestino è il più settentrionale tra quelli situati nel mare Adriatico. Questa particolare posizione ha da sempre permesso alle navi di approdare direttamente nell'Europa centrale. L'incredibile sviluppo che la città conobbe nell'800 grazie al suo porto franco, indusse a trasferirsi qui una moltitudine di lavoratori provenienti dall'Italia nonché tanti uomini d'affari da tutta Europa. Questa crescita così vorticosa, indotta dalla costituzione del porto franco, portò in poco più di un secolo la popolazione a crescere da poche migliaia fino a più di 200 000 persone, disseminando la città di chiese di tutte le maggiori religioni europee. La nuova città multietnica così formata ha nel tempo sviluppato un proprio linguaggio, infatti il Triestino moderno è un dialetto della lingua veneta. Nella provincia di Trieste vive la minoranza autoctona slovena, infatti nei paesi che circondano il capoluogo giuliano, i cartelli stradali e le insegne di molti negozi sono bilingui. La Provincia è la meno estesa d'Italia ed è quarta per densità abitativa, dopo Napoli, Milano e Monza." |
|
] |
|
] |
|
|