File size: 1,005 Bytes
d6813d2
 
 
 
 
 
 
ffa861d
d6813d2
ffa861d
d6813d2
 
 
 
 
 
 
 
 
 
 
 
ffa861d
d6813d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
language:
- en
base_model:
- google/gemma-2-9b-it
---

# Gemma Embeddings v0.8

GemmaEmbed is a dense-vector embedding model, trained especially for retrieval.  As of December 2, 2024, GemmaEmbed achieves the #1 position overall on the _MTEB Retrieval_ leaderboard, with a score of 63.90.  

# Important Notes
* This is not an official Google product.
* This is a research project.

# Results summary

Results compared to BGE-EN-ICL on several large datasets

Model | DBPedia | FEVER | HotPotQA | MSMARCO | NQ |
------ | --------- | ------ |  ------- |  ------- |  ------ |
BGE-EN-ICL | 51.63 | 92.83 | 85.14 | 46.79 | 73.88 |
Gemma-Embeddings-v0.8 | 52.60 | 93.51 | 87.58 | 47.30 | 74.44 |

# Model & Data

Our base encoder model is [Gemma2 9B](https://huggingface.co/google/gemma-2-9b). 

We use the [BGE-EN-ICL training data](https://huggingface.co/datasets/cfli/bge-full-data).

# Research Team

* Nicholas Monath
* Michael Boratko
* Seungyeon Kim
* Andrew McCallum
* Rob Fergus
* Manzil Zaheer