--- license: mit language: - en - fr - de - es - ru base_model: - OrdalieTech/Solon-embeddings-large-0.1 --- ## News 11/12/2024: Release of Algolia/Algolia-large-multilang-generic-v2410, Algolia's multilingual embedding model. ## Models Algolia-large-multilang-generic-v2410 is the first addition to Algolia's suite of multilingual embedding models built for retrieval performance and efficiency in e-commerce search. Algolia v2410 models are the state-of-the-art for their size and use cases and now available under an MIT licence. Note that generic models are trained on public and synthetic e-commerce datasets only. ### Quality Benchmarks |Model|MTEB EN rank|Public e-comm rank| Algolia private e-comm rank| |------------|------------|------------|------------| |Algolia-large-multilang-generic-v2410|21|12|5| Note that our benchmarks are for retrieval task only, and includes open-source models that are approximately 500M parameters and smaller, and commercially available embedding models. ## Usage ### Using Sentence Transformers ```python # Load model and tokenizer from scipy.spatial.distance import cosine from sentence_transformers import SentenceTransformer modelname = "algolia/algolia-large-multilang-generic-v2410" model = SentenceTransformer(modelname) # Define embedding and compute_similarity def get_embedding(text): embedding = model.encode([text]) return embedding[0] def compute_similarity(query, documents): query_emb = get_embedding(query) doc_embeddings = [get_embedding(doc) for doc in documents] # Calculate cosine similarity similarities = [1 - cosine(query_emb, doc_emb) for doc_emb in doc_embeddings] ranked_docs = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True) # Format output return [{"document": doc, "similarity_score": round(sim, 4)} for doc, sim in ranked_docs] # Define inputs query = "query: "+"running shoes" documents = ["adidas sneakers, great for outdoor running", "nike soccer boots indoor, it can be used on turf", "new balance light weight, good for jogging", "hiking boots, good for bushwalking" ] # Output the results result_df = pd.DataFrame(compute_similarity(query,documents)) print(query) result_df.head() ``` ## Contact Feel free to open an issue or pull request if you have any questions or suggestions about this project. You also can email Rasit Abay(rasit.abay@algolia.com). ## License Algolia EN v2410 is licensed under the [MIT](https://mit-license.org/). The released models can be used for commercial purposes free of charge.