arxiv:2408.16100

LLMSecCode: Evaluating Large Language Models for Secure Coding

Published on Aug 28

Authors:

Abstract

The rapid deployment of Large Language Models (LLMs) requires careful consideration of their effect on cybersecurity. Our work aims to improve the selection process of LLMs that are suitable for facilitating Secure Coding (SC). This raises challenging research questions, such as (RQ1) Which functionality can streamline the LLM evaluation? (RQ2) What should the evaluation measure? (RQ3) How to attest that the evaluation process is impartial? To address these questions, we introduce LLMSecCode, an open-source evaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying parameters and prompts, we find a 10% and 9% difference in performance, respectively. We also compare some results to reliable external actors, where our results show a 5% difference. We strive to ensure the ease of use of our open-source framework and encourage further development by external actors. With LLMSecCode, we hope to encourage the standardization and benchmarking of LLMs' capabilities in security-oriented code and tasks.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2408.16100 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.16100 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.16100 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.