@gsarti on Hugging Face: "🔍 Today's pick in Interpretability & Analysis of LMs: Enhanced Hallucination…"

Post

🔍 Today's pick in Interpretability & Analysis of LMs: Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation by A. Himmi G. Staerman M. Picot @Colombo @nunonmg

Previous work on hallucination detection for MT showed that different detectors excel at detecting different types of hallucinations.

In this context, detectors based solely on model internals such as input contributions or sequence log-probabilities fare well on fully-detached hallucinations but show limited performances on oscillatory hallucinations, where ad-hoc trained detectors are still the best-performing methods.

This work proposes Simple Detectors Aggregation (STARE), an aggregation procedure to leverage detectors’ complementary strengths. Authors experiment with two popular hallucination detection benchmarks (LFAN-HALL and HalOmi), showing that STARE outperforms single detectors and other aggregation baselines.

Results obtained aggregating internal detectors highlight how model-based features that are readily available as generation byproducts can outperform computationally expensive ad-hoc solutions.

📄 Paper: Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation (2402.13331)

🔍 All daily picks in LM interpretability: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9

Join the conversation