arxiv:2404.05961

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Published on Apr 9

· Submitted by

akhaliq on Apr 10

#2 Paper of the day

Upvote

Authors:

Parishad BehnamGhader ,

Vaibhav Adlakha ,

Marius Mosbach ,

Nicolas Chapados ,

Siva Reddy

Abstract

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 3 popular LLMs ranging from 1.3B to 7B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data. Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.

View arXiv page View PDF Add to collection

Community

xhluca

Apr 10

Tweets: https://twitter.com/vaibhav_adlakha/status/1777854148584591441

edmond

Apr 11

•

edited Apr 11

How in Huggingface do we transform a causal LLMs like Phi2 or Mistral into a bidirectionnal attention LLM ?
This idea is getting more and more popular I see...

vaibhavad

Paper author May 31

Thanks for your interest in our work. It depends for each model, as they implement the causal mask differently. For the models that we released, we also released custom files in the Huggingface repos that transform the causal model to a bidirectional one.

lengyue233

Apr 15

While the model is finetuned in English wikipedia, does it show good performance on other language (since many llms are pretrained multilingual)?

vaibhavad

Paper author May 31

Thanks for your interest in our work. We have not yet tested it on other languages, we plan to do it in the future.

blanchon

Jun 9

Unleashing Hidden Power: How LLM2Vec Transforms Language Models into Text Encoders

Links 🔗:

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix

Kobee

Sep 2

Hey, i have been working lately on your llm2vec approach on german datasets, i have tested the models on some german datasets for a clustering task. I would like to share my findings and my contributions. How can i do this ?

vaibhavad

Paper author Sep 3

Hi Kobee, thanks for your interest in our work! I am very excited to hear about your findings. We can correspond over email (vaibhav.adlakha@mila.quebec) or Twitter (https://x.com/vaibhav_adlakha)