Pringled commited on
Commit
8684819
·
unverified ·
1 Parent(s): 75dfec7

Add Semble reference, MTEB link, and Semble in additional resources

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -24,7 +24,7 @@ datasets:
24
 
25
  ## Overview
26
 
27
- **potion-code-16M** is a fast static code embedding model optimized for code retrieval tasks. It is distilled from [nomic-ai/CodeRankEmbed](https://huggingface.co/nomic-ai/CodeRankEmbed) and trained on the [CornStack](https://huggingface.co/datasets/nomic-ai/cornstack-python-v1) code corpus using [Tokenlearn](https://github.com/MinishLab/tokenlearn) and contrastive fine-tuning.
28
 
29
  It uses static embeddings, allowing text and code embeddings to be computed orders of magnitude faster than transformer-based models on both GPU and CPU.
30
 
@@ -60,7 +60,7 @@ potion-code-16M is created using the following pipeline:
60
 
61
  ## Results
62
 
63
- Results on the [CoIR benchmark](https://github.com/CoIR-team/coir) (NDCG@10, `mteb>=2.10`):
64
 
65
  | Model | Params | AVG | AppsRetrieval | COIRCodeSearchNet | CodeFeedbackMT | CodeFeedbackST | CodeSearchNetCC | CodeTransContest | CodeTransDL | CosQA | StackOverflow | Text2SQL |
66
  |---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -86,6 +86,7 @@ CoIR covers a broad range of code retrieval scenarios. For the use case of findi
86
 
87
  ## Additional Resources
88
 
 
89
  - [Model2Vec repository](https://github.com/MinishLab/model2vec)
90
  - [Tokenlearn repository](https://github.com/MinishLab/tokenlearn)
91
  - [CornStack dataset](https://huggingface.co/datasets/nomic-ai/cornstack-python-v1)
 
24
 
25
  ## Overview
26
 
27
+ **potion-code-16M** is a fast static code embedding model optimized for code retrieval tasks. It powers [Semble](https://github.com/MinishLab/semble), a code search library for agents. It is distilled from [nomic-ai/CodeRankEmbed](https://huggingface.co/nomic-ai/CodeRankEmbed) and trained on the [CornStack](https://huggingface.co/datasets/nomic-ai/cornstack-python-v1) code corpus using [Tokenlearn](https://github.com/MinishLab/tokenlearn) and contrastive fine-tuning.
28
 
29
  It uses static embeddings, allowing text and code embeddings to be computed orders of magnitude faster than transformer-based models on both GPU and CPU.
30
 
 
60
 
61
  ## Results
62
 
63
+ Results on the [CoIR benchmark](https://github.com/CoIR-team/coir) on [MTEB](https://github.com/embeddings-benchmark/mteb) (NDCG@10, `mteb>=2.10`):
64
 
65
  | Model | Params | AVG | AppsRetrieval | COIRCodeSearchNet | CodeFeedbackMT | CodeFeedbackST | CodeSearchNetCC | CodeTransContest | CodeTransDL | CosQA | StackOverflow | Text2SQL |
66
  |---|---|---|---|---|---|---|---|---|---|---|---|---|
 
86
 
87
  ## Additional Resources
88
 
89
+ - [Semble repository](https://github.com/MinishLab/semble)
90
  - [Model2Vec repository](https://github.com/MinishLab/model2vec)
91
  - [Tokenlearn repository](https://github.com/MinishLab/tokenlearn)
92
  - [CornStack dataset](https://huggingface.co/datasets/nomic-ai/cornstack-python-v1)