d-matrix
/

gpt2

Text Generation

English

Eval Results

Model card Files Files and versions Community

d-matrix commited on Feb 13

Commit

93cadd1

•

1 Parent(s): 55072b7

Create README.md

Browse files

draft of model card

Files changed (1) hide show

README.md +80 -0

README.md ADDED Viewed

	@@ -0,0 +1,80 @@

+---
+license: apache-2.0
+datasets:
+- wikitext
+- ptb_text_only
+language:
+- en
+metrics:
+- perplexity
+pipeline_tag: text-generation
+model-index:
+- name: distilgpt2
+  results:
+  - task:
+      type: text-generation
+    dataset:
+      name: penn_treebank
+      type: ptb_text_only
+    metrics:
+    - name: perlexity@BASELINE
+      type: dmx-perlexity
+      value: 63.45857238769531
+    - name: perlexity@FALLBACK
+      type: dmx-perlexity
+      value: 64.36720275878906
+  - task:
+      type: text-generation
+    dataset:
+      name: wikitext2
+      type: wikitext-2-raw-v1
+    metrics:
+    - name: perlexity@BASELINE
+      type: dmx-perlexity
+      value: 46.05925369262695
+    - name: perlexity@FALLBACK
+      type: dmx-perlexity
+      value: 46.570838928222656
+---
+This is a quantized version of [DistilGPT2](https://huggingface.co/distilbert/distilgpt2). We provide the following two quantization configurations:
+BASELINE: Everything in original format, equivalent to original model.
+FALLBACK: Quantized Linear and Conv1D layers to BFP16. Added approximation functions for Layer Norm, GELU and Softmax.
+### Usage Example
+Prerequisites:
+- Install dmx-mltools: "pip install dmx-mltools"
+- clone this repo. "cd" to the cloned repo.
+```python
+>>> import os
+>>> import torch
+>>> from mltools import dmx
+>>> from transformers import pipeline,AutoModelForCausalLM
+>>> import evaluate
+>>> from datasets import load_dataset
+# Get model
+>>> my_hf_token = os.environ.get("Dmatrix_HF_Token")
+>>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
+>>> pipe = pipeline(
+>>>     "text-generation",
+>>>     model="d-matrix/distilgpt2",
+>>>     device=device,
+>>>     use_auth_token=my_hf_token,
+>>> )
+>>> pipe.model = dmx.Model(pipe.model,monkey_patched=False,hf=True,input_names=["input_ids", "labels"])
+# Configure quantization formats
+>>> pipe.model.transform('FALLBACK.yaml')
+# Evaluate
+>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
+>>> input_texts = load_dataset("ptb_text_only", "penn_treebank", split="test")["sentence"]
+>>> pipe.model.eval()
+>>> results = perplexity.compute(model=pipe.model.body,references=input_texts)
+>>> print(results)
+{'loss': 4.164604187011719, 'perplexity': 64.36720275878906}
+```