File size: 2,611 Bytes
324da95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f801f4
 
324da95
 
7a82379
324da95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
license: mit
language:
- en
- fr
- de
- es
- ru
base_model:
- OrdalieTech/Solon-embeddings-large-0.1
---
## News
11/12/2024: Release of Algolia/Algolia-large-multilang-generic-v2410, Algolia's multilingual embedding model.

## Models
Algolia-large-multilang-generic-v2410 is the first addition to Algolia's suite of multilingual embedding models built for retrieval performance and efficiency in e-commerce search. 
Algolia v2410 models  are the state-of-the-art  for their size and use cases and now available under an MIT licence.

Note that generic models are trained on public and synthetic e-commerce datasets only.

### Quality Benchmarks
|Model|MTEB EN rank|Public e-comm rank| Algolia private e-comm rank|
|------------|------------|------------|------------|
|Algolia-large-multilang-generic-v2410|21|12|5|

Note that our benchmarks are for retrieval task only, and includes open-source models that are approximately 500M parameters and smaller, and commercially available embedding models.

## Usage

### Using Sentence Transformers
```python
# Load model and tokenizer
from scipy.spatial.distance import cosine
from sentence_transformers import SentenceTransformer
modelname = "algolia/algolia-large-multilang-generic-v2410"
model = SentenceTransformer(modelname)

# Define embedding and compute_similarity
def get_embedding(text):
    embedding = model.encode([text])
    return embedding[0]
def compute_similarity(query, documents):
    query_emb = get_embedding(query)
    doc_embeddings = [get_embedding(doc) for doc in documents]
    # Calculate cosine similarity
    similarities = [1 - cosine(query_emb, doc_emb) for doc_emb in doc_embeddings]
    ranked_docs = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)
    # Format output
    return [{"document": doc, "similarity_score": round(sim, 4)} for doc, sim in ranked_docs]

# Define inputs
query = "query: "+"running shoes"
documents = ["adidas sneakers, great for outdoor running",
             "nike soccer boots indoor, it can be used on turf",
             "new balance light weight, good for jogging",
             "hiking boots, good for bushwalking"
            ]

# Output the results
result_df = pd.DataFrame(compute_similarity(query,documents))
print(query)
result_df.head()
```

## Contact
Feel free to open an issue or pull request if you have any questions or suggestions about this project.
You also can email Rasit Abay(rasit.abay@algolia.com).

## License
Algolia EN v2410 is licensed under the [MIT](https://mit-license.org/). The released models can be used for commercial purposes free of charge.