Model Card for LargeCodeModelGPTBigCode

Model Overview

LargeCodeModelGPTBigCode is a model designed for code test generation and analysis. It is based on GPTBigCode and is specifically tailored for handling and generating tests for code. The model has been trained on a small manually labeled dataset of code and can be used for various tasks related to code analysis and testing.

Features:

  • Code test generation.
  • Python code analysis and generation.
  • Uses a pre-trained GPT2 model integrated with Hugging Face.

How it Works

The model is loaded from an external repository, such as Hugging Face, and is initialized using the class LargeCodeModelGPTBigCode. Several parameters can be specified during initialization to configure the model, such as:

  • gpt2_name: The link to the model on Hugging Face
  • prompt_string: An additional wrapper for better understanding of the task by the model
  • params_inference: Inference parameters (used in self.gpt2.generate(**inputs, **inference_params))
  • max_length: The maximum number of tokens in the sequence
  • device: The device to run the model on
  • saved_model_path: Path to the fine-tuned model
  • num_lines: Number of lines (due to "non-terminating" model generation)
  • flag_hugging_face: Flag to enable usage with Hugging Face (default: False)
  • flag_pretrained: Flag to initialize the model with pre-trained weights

You should download inference_gptbigcode.py for proper model usage or use git clone https://huggingface.co/4ervonec19/SimpleTestGenerator instead. Also you may use this file for inference parameters tuning.

Model Initialization

from inference_gptbigcode import LargeCodeModelGPTBigCode

gpt2bigcode = "4ervonec19/SimpleTestGenerator"

CodeModel = LargeCodeModelGPTBigCode(gpt2_name=gpt2bigcode, 
                                    flag_pretrained=True, 
                                    flag_hugging_face=True)

Inference Example

Here’s an example of inference where the model is used to generate tests based on a given code snippet:

code_example = '''def equals_zero(a):
    if a == 0:
      return True
    return False'''

tests_generated = CodeModel.input_inference(code_text=code_example)

# Result
print(tests_generated['generated_output'])

Output:

The result will contain the input function and generated tests dict, for example:

{'input_function': ('def equals_zero(a):\n    if a == 0:\n      return True\n    return False',),
 'generated_output': 'def test_equals_zero():\n    assert equals_zero(0) is True\n    assert equals_zero(1) is False\n    assert equals_zero(0) is True\n    assert equals_zero(1.5) is False'}

Model Details

  • Architecture: GPT2
  • Pretraining: Yes, the model uses a pre-trained GPT2 version for test generation and code generation.
  • Framework: PyTorch/HuggingFace
  • License: MIT (or another, depending on the model's license)

Limitations

  • The model may not always generate correct or optimal tests, especially for complex or non-standard code fragments.
  • Some understanding of code structure may be required for optimal results.
  • The quality of generated tests depends on the quality of the input code and its context.
Downloads last month
34
Safetensors
Model size
1.12B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.