Gabriele Sarti's picture

Gabriele Sarti

gsarti

·

https://gsarti.com

AI & ML interests

Interpretability for generative language models

Recent Activity

liked a dataset about 19 hours ago

OptimalScale/ClimbLab

liked a dataset 1 day ago

Anthropic/values-in-the-wild

upvoted a collection 5 days ago

View all activity

Organizations

gsarti's activity

upvoted a collection 5 days ago

MIB Datasets

The tasks and counterfactuals from the Mechanistic Interpretability Benchmark. • 7 items • Updated 6 days ago • 1

upvoted a paper 5 days ago

NNsight and NDIF: Democratizing Access to Foundation Model Internals

Paper • 2407.14561 • Published Jul 18, 2024 • 36

upvoted a paper 10 days ago

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages

Paper • 2501.06346 • Published Jan 10 • 1

upvoted a paper 13 days ago

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Paper • 2504.07096 • Published 13 days ago • 73

upvoted an article 13 days ago

Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

Feb 12

• 64

upvoted 2 papers about 1 month ago

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Paper • 2502.12892 • Published Feb 18 • 1

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published Mar 7 • 77

upvoted 4 papers about 2 months ago

QE4PE: Word-level Quality Estimation for Human Post-Editing

Paper • 2503.03044 • Published Mar 4 • 6

Wikipedia in the Era of LLMs: Evolution and Risks

Paper • 2503.02879 • Published Mar 4 • 21

A Close Look at Decomposition-based XAI-Methods for Transformer Language Models

Paper • 2502.15886 • Published Feb 21 • 1

Position-aware Automatic Circuit Discovery

Paper • 2502.04577 • Published Feb 7 • 1

upvoted a paper 2 months ago

We Can't Understand AI Using our Existing Vocabulary

Paper • 2502.07586 • Published Feb 11 • 10

upvoted a paper 3 months ago

ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 25

upvoted a collection 3 months ago

Reasoning Datasets

Distilled synthetic Reasoning datasets • 7 items • Updated Feb 2 • 60

upvoted an article 3 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.22k

upvoted 3 papers 3 months ago

Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution

Paper • 2501.18887 • Published Jan 31 • 1

Propositional Interpretability in Artificial Intelligence

Paper • 2501.15740 • Published Jan 27 • 2

Partially Rewriting a Transformer in Natural Language

Paper • 2501.18838 • Published Jan 31 • 2