manzilzaheer
commited on
Commit
•
d6813d2
1
Parent(s):
174c426
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
base_model:
|
5 |
+
- google/gemma-2-9b-it
|
6 |
+
---
|
7 |
+
|
8 |
+
# Gofer Embeddings v0.8
|
9 |
+
|
10 |
+
Gofer is a dense-vector embedding model, trained especially for retrieval. As of December 2, 2024, Gofer achieves the #1 position overall on the _MTEB Retrieval_ leaderboard, with a score of 63.01.
|
11 |
+
|
12 |
+
# Important Notes
|
13 |
+
* This is not an official Google product.
|
14 |
+
* This is a research project.
|
15 |
+
|
16 |
+
# Results summary
|
17 |
+
|
18 |
+
Results compared to BGE-EN-ICL on several large datasets
|
19 |
+
|
20 |
+
Model | DBPedia | FEVER | HotPotQA | MSMARCO | NQ |
|
21 |
+
------ | --------- | ------ | ------- | ------- | ------ |
|
22 |
+
BGE-EN-ICL | 51.63 | 92.83 | 85.14 | 46.79 | 73.88 |
|
23 |
+
Gofer Embeddings v0.8 | 52.58 | 93.225 | 86.921 | 47.537 | 73.75 |
|
24 |
+
|
25 |
+
# Model & Data
|
26 |
+
|
27 |
+
Our base encoder model is [Gemma2 9B](https://huggingface.co/google/gemma-2-9b).
|
28 |
+
|
29 |
+
We use the [BGE-EN-ICL training data](https://huggingface.co/datasets/cfli/bge-full-data).
|
30 |
+
|
31 |
+
# Research Team
|
32 |
+
|
33 |
+
* Nicholas Monath
|
34 |
+
* Michael Boratko
|
35 |
+
* Seungyeon Kim
|
36 |
+
* Andrew McCallum
|
37 |
+
* Rob Fergus
|
38 |
+
* Manzil Zaheer
|
39 |
+
|
40 |
+
|