mpnet-code-search

This is a finetuned sentence-transformers model. It was trained on Natural Language-Programming Language pairs, improving the performance for code search and retrieval applications.

Usage (Sentence-Transformers)

This model can be loaded with sentence-transformers:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["Print hello world to stdout", "print('hello world')"]

model = SentenceTransformer('sweepai/mpnet-code-search')
embeddings = model.encode(sentences)
print(embeddings)

Evaluation Results

MRR for CoSQA and AdvTest dataset:

  • Base model
  • Finetuned model

Background

This project aims to improve the performance of the fine-tuned SBERT MPNet model for coding applications.

We developed this model to use in our own app, Sweep, an AI-powered junior developer.

Intended Uses

Our model is intended to be used on code search applications, allowing users to search natural language prompts and find corresponding code chunks.

Chunking (Open-Source)

We developed our own chunking algorithm to improve the quality of a repository's code snippets. This tree-based algorithm is described in Our Blog Post.

Demo

We created an interactive demo for our new chunking algorithm.


Training Procedure

Base Model

We use the pretrained sentence-transformers/all-mpnet-base-v2. Please refer to the model card for a more detailed overview on training data.

Finetuning

We finetune the model using a contrastive objective.

Hyperparameters

We trained on 8x A5000s.

Training Data

| Dataset | Number of training tuples | | CoSQA | 20,000 | | AdvTest | 250,000 | | Total | 270,000 |

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train sweepai/mpnet-code-search