spacemanidol
commited on
Commit
•
6ae1a8a
1
Parent(s):
4bee6ff
Update README.md
Browse files
README.md
CHANGED
@@ -2801,7 +2801,6 @@ model-index:
|
|
2801 |
- type: v_measure
|
2802 |
value: 81.46426354153643
|
2803 |
---
|
2804 |
-
---
|
2805 |
<h1 align="center">Snowflake's Artic-embed-m</h1>
|
2806 |
<h4 align="center">
|
2807 |
<p>
|
@@ -2837,10 +2836,10 @@ The models are trained by leveraging existing open-source text representation mo
|
|
2837 |
|
2838 |
| Name | MTEB Retrieval Score (NDCG @ 10) | Parameters (Millions) | Embedding Dimension |
|
2839 |
| ----------------------------------------------------------------------- | -------------------------------- | --------------------- | ------------------- |
|
2840 |
-
| [arctic-embed-
|
2841 |
| [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-s/) | 51.98 | 33 | 384 |
|
2842 |
-
| [arctic-embed-
|
2843 |
-
| [arctic-embed-
|
2844 |
| [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 | 335 | 1024 |
|
2845 |
|
2846 |
|
@@ -2849,32 +2848,32 @@ Aside from being great open-source models, the largest model, [arctic-embed-l](h
|
|
2849 |
|
2850 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2851 |
| ------------------------------------------------------------------ | -------------------------------- |
|
2852 |
-
| [arctic-embed-
|
2853 |
| Google-gecko-text-embedding | 55.7 |
|
2854 |
| text-embedding-3-large | 55.44 |
|
2855 |
| Cohere-embed-english-v3.0 | 55.00 |
|
2856 |
| bge-large-en-v1.5 | 54.29 |
|
2857 |
|
2858 |
|
2859 |
-
### [
|
2860 |
|
2861 |
|
2862 |
-
This tiny model packs quite the punch
|
2863 |
|
2864 |
|
2865 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2866 |
| ------------------------------------------------------------------- | -------------------------------- |
|
2867 |
-
| [arctic-embed-
|
2868 |
| GIST-all-MiniLM-L6-v2 | 45.12 |
|
2869 |
| gte-tiny | 44.92 |
|
2870 |
| all-MiniLM-L6-v2 | 41.95 |
|
2871 |
| bge-micro-v2 | 42.56 |
|
2872 |
|
2873 |
|
2874 |
-
### Arctic-embed-s
|
2875 |
|
2876 |
|
2877 |
-
Based on the [all-MiniLM-L12-v2](https://huggingface.co/
|
2878 |
|
2879 |
|
2880 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
@@ -2886,34 +2885,33 @@ Based on the [all-MiniLM-L12-v2](https://huggingface.co/intfloat/e5-base-unsuper
|
|
2886 |
| e5-small-v2 | 49.04 |
|
2887 |
|
2888 |
|
2889 |
-
### [
|
2890 |
|
2891 |
|
2892 |
-
Based on the [
|
2893 |
|
2894 |
|
2895 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2896 |
| ------------------------------------------------------------------ | -------------------------------- |
|
2897 |
-
| [arctic-embed-
|
2898 |
| bge-base-en-v1.5 | 53.25 |
|
2899 |
-
| nomic-embed-text-v1.5 | 53.
|
2900 |
| GIST-Embedding-v0 | 52.31 |
|
2901 |
| gte-base | 52.31 |
|
2902 |
|
2903 |
-
|
2904 |
-
### Arctic-embed-m
|
2905 |
|
2906 |
|
2907 |
-
Based on the [
|
2908 |
|
2909 |
|
2910 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2911 |
| ------------------------------------------------------------------ | -------------------------------- |
|
2912 |
-
| [arctic-embed-
|
2913 |
-
|
|
2914 |
-
| nomic-embed-text-v1
|
2915 |
-
|
2916 |
-
|
2917 |
|
2918 |
|
2919 |
### [arctic-embed-l](https://huggingface.co/Snowflake/arctic-embed-l/)
|
@@ -2924,7 +2922,7 @@ Based on the [intfloat/e5-large-unsupervised](https://huggingface.co/intfloat/e5
|
|
2924 |
|
2925 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2926 |
| ------------------------------------------------------------------ | -------------------------------- |
|
2927 |
-
| [arctic-embed-
|
2928 |
| UAE-Large-V1 | 54.66 |
|
2929 |
| bge-large-en-v1.5 | 54.29 |
|
2930 |
| mxbai-embed-large-v1 | 54.39 |
|
|
|
2801 |
- type: v_measure
|
2802 |
value: 81.46426354153643
|
2803 |
---
|
|
|
2804 |
<h1 align="center">Snowflake's Artic-embed-m</h1>
|
2805 |
<h4 align="center">
|
2806 |
<p>
|
|
|
2836 |
|
2837 |
| Name | MTEB Retrieval Score (NDCG @ 10) | Parameters (Millions) | Embedding Dimension |
|
2838 |
| ----------------------------------------------------------------------- | -------------------------------- | --------------------- | ------------------- |
|
2839 |
+
| [arctic-embed-xs](https://huggingface.co/Snowflake/arctic-embed-xs/) | 50.15 | 22 | 384 |
|
2840 |
| [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-s/) | 51.98 | 33 | 384 |
|
2841 |
+
| [arctic-embed-m](https://huggingface.co/Snowflake/arctic-embed-m/) | 54.90 | 110 | 768 |
|
2842 |
+
| [arctic-embed-m-long](https://huggingface.co/Snowflake/arctic-embed-m-long/) | 54.83 | 137 | 768 |
|
2843 |
| [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 | 335 | 1024 |
|
2844 |
|
2845 |
|
|
|
2848 |
|
2849 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2850 |
| ------------------------------------------------------------------ | -------------------------------- |
|
2851 |
+
| [arctic-embed-l](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 |
|
2852 |
| Google-gecko-text-embedding | 55.7 |
|
2853 |
| text-embedding-3-large | 55.44 |
|
2854 |
| Cohere-embed-english-v3.0 | 55.00 |
|
2855 |
| bge-large-en-v1.5 | 54.29 |
|
2856 |
|
2857 |
|
2858 |
+
### [Arctic-embed-xs](https://huggingface.co/Snowflake/arctic-embed-xs)
|
2859 |
|
2860 |
|
2861 |
+
This tiny model packs quite the punch. Based on the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model with only 22m parameters and 384 dimensions, this model should meet even the strictest latency/TCO budgets. Despite its size, its retrieval accuracy is closer to that of models with 100m paramers.
|
2862 |
|
2863 |
|
2864 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2865 |
| ------------------------------------------------------------------- | -------------------------------- |
|
2866 |
+
| [arctic-embed-xs](https://huggingface.co/Snowflake/arctic-embed-xs/) | 50.15 |
|
2867 |
| GIST-all-MiniLM-L6-v2 | 45.12 |
|
2868 |
| gte-tiny | 44.92 |
|
2869 |
| all-MiniLM-L6-v2 | 41.95 |
|
2870 |
| bge-micro-v2 | 42.56 |
|
2871 |
|
2872 |
|
2873 |
+
### [Arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-s)
|
2874 |
|
2875 |
|
2876 |
+
Based on the [all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) model, this small model does not trade off retrieval accuracy for its small size. With only 33m parameters and 384 dimensions, this model should easily allow scaling to large datasets.
|
2877 |
|
2878 |
|
2879 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
|
|
2885 |
| e5-small-v2 | 49.04 |
|
2886 |
|
2887 |
|
2888 |
+
### [Arctic-embed-m](https://huggingface.co/Snowflake/arctic-embed-m/)
|
2889 |
|
2890 |
|
2891 |
+
Based on the [intfloat/e5-base-unsupervised](https://huggingface.co/intfloat/e5-base-unsupervised) model, this medium model is the workhorse that provides the best retrieval performance without slowing down inference.
|
2892 |
|
2893 |
|
2894 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2895 |
| ------------------------------------------------------------------ | -------------------------------- |
|
2896 |
+
| [arctic-embed-m](https://huggingface.co/Snowflake/arctic-embed-m/) | 54.90 |
|
2897 |
| bge-base-en-v1.5 | 53.25 |
|
2898 |
+
| nomic-embed-text-v1.5 | 53.25 |
|
2899 |
| GIST-Embedding-v0 | 52.31 |
|
2900 |
| gte-base | 52.31 |
|
2901 |
|
2902 |
+
### [arctic-embed-m-long](https://huggingface.co/Snowflake/arctic-embed-m-long/)
|
|
|
2903 |
|
2904 |
|
2905 |
+
Based on the [nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1) model, this long-context variant of our medium-sized model is perfect for workloads that can be constrained by the regular 512 token context of our other models. Without the use of RPE, this model supports up to 2048 tokens. With RPE, it can scale to 8192!
|
2906 |
|
2907 |
|
2908 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2909 |
| ------------------------------------------------------------------ | -------------------------------- |
|
2910 |
+
| [arctic-embed-m-long](https://huggingface.co/Snowflake/arctic-embed-m-long/) | 54.83 |
|
2911 |
+
| nomic-embed-text-v1.5 | 53.01 |
|
2912 |
+
| nomic-embed-text-v1 | 52.81 |
|
2913 |
+
|
2914 |
+
|
2915 |
|
2916 |
|
2917 |
### [arctic-embed-l](https://huggingface.co/Snowflake/arctic-embed-l/)
|
|
|
2922 |
|
2923 |
| Model Name | MTEB Retrieval Score (NDCG @ 10) |
|
2924 |
| ------------------------------------------------------------------ | -------------------------------- |
|
2925 |
+
| [arctic-embed-l](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 |
|
2926 |
| UAE-Large-V1 | 54.66 |
|
2927 |
| bge-large-en-v1.5 | 54.29 |
|
2928 |
| mxbai-embed-large-v1 | 54.39 |
|