metadata

tags:
  - autotrain
  - summarization
language:
  - en
widget:
  - text: |
      def preprocess(text: str) -> str:
          text = str(text)
          text = text.replace('\\n', ' ')
          tokenized_text = text.split(' ')
          preprocessed_text = " ".join([token for token in tokenized_text if token])

          return preprocessed_text
datasets:
  - sagard21/autotrain-data-code-explainer
co2_eq_emissions:
  emissions: 5.393079045128973
license: mit
pipeline_tag: summarization

Model Trained Using AutoTrain

Problem type: Summarization
Model ID: 2745581349
CO2 Emissions (in grams): 5.3931

Model Description

This model is an attempt to simplify code understanding by generating line by line explanation of a source code. This model was fine-tuned using the Salesforce/codet5-large model. Currently it is trained on a small subset of Python snippets.

Model Usage

from transformers import AutoTokenizer, T5ForConditionalGeneration, SummarizationPipeline

pipeline = SummarizationPipeline(
    model=T5ForConditionalGeneration.from_pretrained("sagard21/python-code-explainer"),
    tokenizer=AutoTokenizer.from_pretrained("sagard21/python-code-explainer", skip_special_tokens=True),
)

raw_code = """
def preprocess(text: str) -> str:
  text = str(text)
  text = text.replace("\n", " ")
  tokenized_text = text.split(" ")
  preprocessed_text = " ".join([token for token in tokenized_text if token])

  return preprocessed_text
"""
pipeline([raw_code])

Expected JSON Output

[
  {
    "summary_text": "Create a function preprocess that will take the text as an argument and return the preprocessed text.\n1. In this case, the text will be converted to a string.\n2. At first, we will replace all \"\\n\" with \" \" and then split the text by \" \".\n3. Then we will call the tokenize function on the text and tokenize the text using the split() method.\n4. Next step is to create a list of all the tokens in the string and join them together.\n5. Then the function will return the string preprocessed_text.\n"
  }
]

Validation Metrics

Loss: 2.156
Rouge1: 29.375
Rouge2: 18.128
RougeL: 25.445
RougeLsum: 28.084
Gen Len: 19.000