What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation Paper • 2404.07129 • Published 17 days ago • 2
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models Paper • 2404.07004 • Published 17 days ago • 3
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 8 items • Updated 10 days ago • 56
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated 24 days ago • 74
A little guide to building Large Language Models in 2024 Collection Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757 • 19 items • Updated 26 days ago • 13
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Paper • 2403.19647 • Published 30 days ago • 3
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms Paper • 2403.17806 • Published Mar 26 • 3
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers Paper • 2310.03686 • Published Oct 5, 2023 • 3
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 14
Information Flow Routes: Automatically Interpreting Language Models at Scale Paper • 2403.00824 • Published Feb 27 • 3
AtP*: An efficient and scalable method for localizing LLM behaviour to components Paper • 2403.00745 • Published Mar 1 • 8
LiT5 Collection Linguistically-Informed T5 models from the LREC-COLING paper "Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It)". • 6 items • Updated Feb 28 • 2
CausalGym: Benchmarking causal interpretability methods on linguistic tasks Paper • 2402.12560 • Published Feb 19 • 3
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking Paper • 2402.14811 • Published Feb 22 • 4
Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation Paper • 2402.13331 • Published Feb 20 • 2
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space Paper • 2402.12865 • Published Feb 20 • 1
In-Context Learning Demonstration Selection via Influence Analysis Paper • 2402.11750 • Published Feb 19 • 2
⛔️🔦 Provenance, Watermarking & Deepfake Detection Collection Technical tools for more control over non-consensual synthetic content • 14 items • Updated 26 days ago • 34
Recovering the Pre-Fine-Tuning Weights of Generative Models Paper • 2402.10208 • Published Feb 15 • 6
SyntaxShap: Syntax-aware Explainability Method for Text Generation Paper • 2402.09259 • Published Feb 14 • 2
Show Me How It's Done: The Role of Explanations in Fine-Tuning Language Models Paper • 2402.07543 • Published Feb 12 • 2
LLM Hallucination Detection Papers Collection Collection of LLM hallucination and evaluation papers that I've been exploring and implementing. Some of them have my comments and annotated doodles. • 12 items • Updated Feb 20 • 12
AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers Paper • 2402.05602 • Published Feb 8 • 3
Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models Paper • 2402.04614 • Published Feb 7 • 3
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Paper • 2402.04248 • Published Feb 6 • 24
INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection Paper • 2402.03744 • Published Feb 6 • 4
Best open source tools Collection There's an open source tool for that too • 1 item • Updated Feb 6 • 1
Rethinking Interpretability in the Era of Large Language Models Paper • 2402.01761 • Published Jan 30 • 18
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains Paper • 2402.00559 • Published Feb 1 • 3
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods Paper • 2309.16042 • Published Sep 27, 2023 • 3
LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools Paper • 2401.12576 • Published Jan 23 • 2
The Calibration Gap between Model and Human Confidence in Large Language Models Paper • 2401.13835 • Published Jan 24 • 4
Model Editing Can Hurt General Abilities of Large Language Models Paper • 2401.04700 • Published Jan 9 • 3
From Understanding to Utilization: A Survey on Explainability for Large Language Models Paper • 2401.12874 • Published Jan 23 • 4
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models Paper • 2401.06102 • Published Jan 11 • 18
Fine-grained Hallucination Detection and Editing for Language Models Paper • 2401.06855 • Published Jan 12 • 3
TIGERScore Collection List of model variates of TIGEREScore checkpoints and the associated dataset • 8 items • Updated Jan 18 • 3
Quantifying the Plausibility of Context Reliance in Neural Machine Translation Paper • 2310.01188 • Published Oct 2, 2023 • 1
Reward models on the hub Collection UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated 14 days ago • 23
Zephyr 7B Collection Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 9 items • Updated 15 days ago • 134
Custom Components ✨ Collection Awesome gradio custom components to get you started build your own! • 7 items • Updated Nov 20, 2023 • 31
MADLAD-400 Collection Models and spaces for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset • 8 items • Updated Nov 14, 2023 • 3
SEAHORSE release Collection The SEAHORSE metrics (as described in https://arxiv.org/abs/2305.13194). • 12 items • Updated 18 days ago • 16
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 63 items • Updated 2 days ago • 283
🇮🇹 Italian NLP Resources Collection Collection of models, datasets and demos relevant to Italian NLP 🇮🇹 • 151 items • Updated 2 days ago • 15