Model Card for LargeCodeModelGPTBigCode

Model Overview

LargeCodeModelGPTBigCode is a model designed for code test generation and analysis. It is based on GPTBigCode and is specifically tailored for handling and generating tests for code. The model has been trained on a small manually labeled dataset of code and can be used for various tasks related to code analysis and testing.

Features:

Code test generation.
Python code analysis and generation.
Uses a pre-trained GPT2 model integrated with Hugging Face.

How it Works

The model is loaded from an external repository, such as Hugging Face, and is initialized using the class LargeCodeModelGPTBigCode. Several parameters can be specified during initialization to configure the model, such as:

gpt2_name: The link to the model on Hugging Face
prompt_string: An additional wrapper for better understanding of the task by the model
params_inference: Inference parameters (used in self.gpt2.generate(**inputs, **inference_params))
max_length: The maximum number of tokens in the sequence
device: The device to run the model on
saved_model_path: Path to the fine-tuned model
num_lines: Number of lines (due to "non-terminating" model generation)
flag_hugging_face: Flag to enable usage with Hugging Face (default: False)
flag_pretrained: Flag to initialize the model with pre-trained weights

You should download inference_gptbigcode.py for proper model usage or use git clone https://huggingface.co/4ervonec19/SimpleTestGenerator instead. Also you may use this file for inference parameters tuning.

Model Initialization

from inference_gptbigcode import LargeCodeModelGPTBigCode

gpt2bigcode = "4ervonec19/SimpleTestGenerator"

CodeModel = LargeCodeModelGPTBigCode(gpt2_name=gpt2bigcode, 
                                    flag_pretrained=True, 
                                    flag_hugging_face=True)

Inference Example

Here’s an example of inference where the model is used to generate tests based on a given code snippet:

code_example = '''def equals_zero(a):
    if a == 0:
      return True
    return False'''

tests_generated = CodeModel.input_inference(code_text=code_example)

# Result
print(tests_generated['generated_output'])

Output:

The result will contain the input function and generated tests dict, for example:

{'input_function': ('def equals_zero(a):\n    if a == 0:\n      return True\n    return False',),
 'generated_output': 'def test_equals_zero():\n    assert equals_zero(0) is True\n    assert equals_zero(1) is False\n    assert equals_zero(0) is True\n    assert equals_zero(1.5) is False'}

Model Details

Architecture: GPT2
Pretraining: Yes, the model uses a pre-trained GPT2 version for test generation and code generation.
Framework: PyTorch/HuggingFace
License: MIT (or another, depending on the model's license)

Limitations

The model may not always generate correct or optimal tests, especially for complex or non-standard code fragments.
Some understanding of code structure may be required for optimal results.
The quality of generated tests depends on the quality of the input code and its context.