minishlab
/

potion-code-16M

@@ -24,7 +24,7 @@ datasets:
 ## Overview
-**potion-code-16M** is a fast static code embedding model optimized for code retrieval tasks. It is distilled from [nomic-ai/CodeRankEmbed](https://huggingface.co/nomic-ai/CodeRankEmbed) and trained on the [CornStack](https://huggingface.co/datasets/nomic-ai/cornstack-python-v1) code corpus using [Tokenlearn](https://github.com/MinishLab/tokenlearn) and contrastive fine-tuning.
 It uses static embeddings, allowing text and code embeddings to be computed orders of magnitude faster than transformer-based models on both GPU and CPU.
@@ -60,7 +60,7 @@ potion-code-16M is created using the following pipeline:
 ## Results
-Results on the [CoIR benchmark](https://github.com/CoIR-team/coir) (NDCG@10, `mteb>=2.10`):
 | Model | Params | AVG | AppsRetrieval | COIRCodeSearchNet | CodeFeedbackMT | CodeFeedbackST | CodeSearchNetCC | CodeTransContest | CodeTransDL | CosQA | StackOverflow | Text2SQL |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -86,6 +86,7 @@ CoIR covers a broad range of code retrieval scenarios. For the use case of findi
 ## Additional Resources
 - [Model2Vec repository](https://github.com/MinishLab/model2vec)
 - [Tokenlearn repository](https://github.com/MinishLab/tokenlearn)
 - [CornStack dataset](https://huggingface.co/datasets/nomic-ai/cornstack-python-v1)

 ## Overview
+**potion-code-16M** is a fast static code embedding model optimized for code retrieval tasks. It powers [Semble](https://github.com/MinishLab/semble), a code search library for agents. It is distilled from [nomic-ai/CodeRankEmbed](https://huggingface.co/nomic-ai/CodeRankEmbed) and trained on the [CornStack](https://huggingface.co/datasets/nomic-ai/cornstack-python-v1) code corpus using [Tokenlearn](https://github.com/MinishLab/tokenlearn) and contrastive fine-tuning.
 It uses static embeddings, allowing text and code embeddings to be computed orders of magnitude faster than transformer-based models on both GPU and CPU.
 ## Results
+Results on the [CoIR benchmark](https://github.com/CoIR-team/coir) on [MTEB](https://github.com/embeddings-benchmark/mteb) (NDCG@10, `mteb>=2.10`):
 | Model | Params | AVG | AppsRetrieval | COIRCodeSearchNet | CodeFeedbackMT | CodeFeedbackST | CodeSearchNetCC | CodeTransContest | CodeTransDL | CosQA | StackOverflow | Text2SQL |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|
 ## Additional Resources
+- [Semble repository](https://github.com/MinishLab/semble)
 - [Model2Vec repository](https://github.com/MinishLab/model2vec)
 - [Tokenlearn repository](https://github.com/MinishLab/tokenlearn)
 - [CornStack dataset](https://huggingface.co/datasets/nomic-ai/cornstack-python-v1)