spacemanidol
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -2801,7 +2801,7 @@ model-index:
|
|
2801 |
value: 85.30624598674467
|
2802 |
license: apache-2.0
|
2803 |
---
|
2804 |
-
<h1 align="center">Snowflake's
|
2805 |
<h4 align="center">
|
2806 |
<p>
|
2807 |
<a href=#news>News</a> |
|
@@ -2825,7 +2825,7 @@ license: apache-2.0
|
|
2825 |
## Models
|
2826 |
|
2827 |
|
2828 |
-
Arctic-Embed is a suite of text embedding models that focuses on creating high-quality
|
2829 |
|
2830 |
|
2831 |
The `arctic-text-embedding` models achieve **state-of-the-art performance on the MTEB/BEIR leaderboard** for each of their size variants. Evaluation is performed using these [scripts](https://github.com/Snowflake-Labs/arctic-embed/tree/main/src). As shown below, each class of model size achieves SOTA retrieval accuracy when compared to other top models.
|
@@ -2944,8 +2944,8 @@ To use an arctic-embed model, you can use the transformers package, as shown bel
|
|
2944 |
import torch
|
2945 |
from transformers import AutoModel, AutoTokenizer
|
2946 |
|
2947 |
-
tokenizer = AutoTokenizer.from_pretrained('Snowflake/
|
2948 |
-
model = AutoModel.from_pretrained('Snowflake/
|
2949 |
model.eval()
|
2950 |
|
2951 |
query_prefix = 'Represent this sentence for searching relevant passages: '
|
@@ -2981,7 +2981,7 @@ If you use the long context model and have more than 2048 tokens, ensure that yo
|
|
2981 |
|
2982 |
|
2983 |
``` py
|
2984 |
-
model = AutoModel.from_pretrained('Snowflake/
|
2985 |
```
|
2986 |
|
2987 |
|
|
|
2801 |
value: 85.30624598674467
|
2802 |
license: apache-2.0
|
2803 |
---
|
2804 |
+
<h1 align="center">Snowflake's Artic-embed-m</h1>
|
2805 |
<h4 align="center">
|
2806 |
<p>
|
2807 |
<a href=#news>News</a> |
|
|
|
2825 |
## Models
|
2826 |
|
2827 |
|
2828 |
+
Arctic-Embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance.
|
2829 |
|
2830 |
|
2831 |
The `arctic-text-embedding` models achieve **state-of-the-art performance on the MTEB/BEIR leaderboard** for each of their size variants. Evaluation is performed using these [scripts](https://github.com/Snowflake-Labs/arctic-embed/tree/main/src). As shown below, each class of model size achieves SOTA retrieval accuracy when compared to other top models.
|
|
|
2944 |
import torch
|
2945 |
from transformers import AutoModel, AutoTokenizer
|
2946 |
|
2947 |
+
tokenizer = AutoTokenizer.from_pretrained('Snowflake/arctic-embed-m')
|
2948 |
+
model = AutoModel.from_pretrained('Snowflake/arctic-embed-m', add_pooling_layer=False)
|
2949 |
model.eval()
|
2950 |
|
2951 |
query_prefix = 'Represent this sentence for searching relevant passages: '
|
|
|
2981 |
|
2982 |
|
2983 |
``` py
|
2984 |
+
model = AutoModel.from_pretrained('Snowflake/arctic-embed-m-long', trust_remote_code=True, rotary_scaling_factor=2)
|
2985 |
```
|
2986 |
|
2987 |
|