Model Card for se-bert

Provides Generic Software Engineering LM from "Enhancing Automated Software Traceability by Transfer Learning from Open-World Data"

Model Details

The following language models is trained on the Git Corpus and Git Links from 2016 to 2021. The data contains 4 types of records including Comments, Issues, Pull Requests, and Commits.

Uses

This model is intended to be a good set of starting weights for various software engineering tasks including:

  • requirements classification
  • traceability link prediction
  • retrieval / search

Training, Evaluation, and Results

Please see cited paper for complete details on training method.

Technical Specifications

Model Architecture and Objective

MLM model trained on SE Corpus (See Above).

Hardware

1 GPU with CUDA 10.2 or 11.1

Software

Python >= 3.7 pytorch/1.1.0

Citation [optional]

BibTeX:

@misc{lin2022enhancing, title={Enhancing Automated Software Traceability by Transfer Learning from Open-World Data}, author={Jinfeng Lin and Amrit Poudel and Wenhao Yu and Qingkai Zeng and Meng Jiang and Jane Cleland-Huang}, year={2022}, eprint={2207.01084}, archivePrefix={arXiv}, primaryClass={cs.SE} }

Model Card Authors [optional]

Jinfeng Lin, Amrit Poudel, Wenhao Yu, Qingkai Zeng, Jane Cleland-Huang

Model Card Contact

Alberto Rodriguez (arodri39@nd.edu)

Downloads last month
112
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.