sagard21
/

python-code-explainer

text2text-generation

Trained with AutoTrain

text-generation-inference

Model card Files Files and versions Community

sagard21 commited on Jan 6, 2023

Commit

efd03f1

·

1 Parent(s): 633ef02

Update README.md

Update model description

Files changed (1) hide show

README.md +45 -10

README.md CHANGED Viewed

@@ -3,13 +3,15 @@ tags:
 - autotrain
 - summarization
 language:
-- unk
 widget:
-- text: "I love AutoTrain 🤗"
 datasets:
 - sagard21/autotrain-data-code-explainer
 co2_eq_emissions:
   emissions: 5.393079045128973
 ---
 # Model Trained Using AutoTrain
@@ -18,6 +20,47 @@ co2_eq_emissions:
 - Model ID: 2745581349
 - CO2 Emissions (in grams): 5.3931
 ## Validation Metrics
 - Loss: 2.156
@@ -26,11 +69,3 @@ co2_eq_emissions:
 - RougeL: 25.445
 - RougeLsum: 28.084
 - Gen Len: 19.000
-## Usage
-You can use cURL to access this model:
-```
-$ curl -X POST -H "Authorization: Bearer YOUR_HUGGINGFACE_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoTrain"}' https://api-inference.huggingface.co/sagard21/autotrain-code-explainer-2745581349
-```

 - autotrain
 - summarization
 language:
+- en
 widget:
+- text: I love AutoTrain 🤗
 datasets:
 - sagard21/autotrain-data-code-explainer
 co2_eq_emissions:
   emissions: 5.393079045128973
+license: mit
+pipeline_tag: summarization
 ---
 # Model Trained Using AutoTrain
 - Model ID: 2745581349
 - CO2 Emissions (in grams): 5.3931
+# Model Description
+This model is an attempt to simplify code understanding by generating line by line explanation of a source code. This model was fine-tuned using the Salesforce/codet5-large model. Currently it is trained on a small subset of Python snippets.
+# Model Usage
+```py
+from transformers import AutoTokenizer, T5ForConditionalGeneration, SummarizationPipeline
+import torch
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+pipeline = SummarizationPipeline(
+    model=T5ForConditionalGeneration.from_pretrained("sagard21/python-code-explainer"),
+    tokenizer=AutoTokenizer.from_pretrained("sagard21/python-code-explainer", skip_special_tokens=True),
+    device=device
+)
+raw_code = """
+def preprocess(text: str) -> str:
+  text = str(text)
+  text = text.replace("\n", " ")
+  tokenized_text = text.split(" ")
+  preprocessed_text = " ".join([token for token in tokenized_text if token])
+  return preprocessed_text
+"""
+pipeline([raw_code])
+```
+### Expected JSON Output
+```
+[
+  {
+    "summary_text": "Create a function preprocess that will take the text as an argument and return the preprocessed text.\n1. In this case, the text will be converted to a string.\n2. At first, we will replace all \"\\n\" with \" \" and then split the text by \" \".\n3. Then we will call the tokenize function on the text and tokenize the text using the split() method.\n4. Next step is to create a list of all the tokens in the string and join them together.\n5. Then the function will return the string preprocessed_text.\n"
+  }
+]
+```
 ## Validation Metrics
 - Loss: 2.156
 - RougeL: 25.445
 - RougeLsum: 28.084
 - Gen Len: 19.000