iansotnek commited on
Commit
c651dd6
1 Parent(s): a75f83d

update to use instruct_pipeline

Browse files
Files changed (1) hide show
  1. README.md +25 -66
README.md CHANGED
@@ -50,85 +50,44 @@ Just as with any other LLM, we advise users of this technology to exercise good
50
 
51
  ## Usage
52
 
53
- The code below shows how to use `dlite-v2-355m` in the way which it was trained. While the model can be used "out of the box" using the
54
- `transformers` library, using the function defined below to create a response from the model will achieve better results.
55
-
56
- ### Load Model and Tokenizer from this Repository Using the `transformers` Package
57
 
58
  ```python
59
- from transformers import AutoModelForCausalLM, AutoTokenizer
60
- import numpy as np
61
- import re
62
-
63
- model_id = 'aisquared/dlite-v2-355m'
64
-
65
- tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side = 'left')
66
- model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code = True, device_map = 'auto')
67
  ```
68
 
69
-
70
- ### Create the Prompt Format and Other Variables
 
 
71
 
72
  ```python
73
- PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
 
74
 
75
- ### Instruction:
76
- {instruction}
77
 
78
- ### Response:
79
- """
80
 
81
- END_KEY = '### End'
82
- RESPONSE_KEY = '### Response:\n'
 
83
  ```
84
 
85
-
86
- ### Create a Function to Retrieve a Response
87
 
88
  ```python
89
- def create_response(
90
- instruction,
91
- model,
92
- tokenizer,
93
- do_sample = True,
94
- max_new_tokens = 256,
95
- top_p = 0.92,
96
- top_k = 0,
97
- **kwargs
98
- ):
99
- """
100
- Create a response from the model by using a formatted prompt
101
- """
102
- input_ids = tokenizer(
103
- PROMPT.format(instruction=instruction), return_tensors="pt"
104
- ).input_ids
105
-
106
- gen_tokens = model.generate(
107
- input_ids,
108
- pad_token_id=tokenizer.pad_token_id,
109
- do_sample=do_sample,
110
- max_new_tokens=max_new_tokens,
111
- top_p=top_p,
112
- top_k=top_k,
113
- **kwargs,
114
- )
115
- decoded = tokenizer.batch_decode(gen_tokens)[0]
116
-
117
- # The response appears after "### Response:". The model has been trained to append "### End" at the end.
118
- m = re.search(r"#+\s*Response:\s*(.+?)#+\s*End", decoded, flags=re.DOTALL)
119
-
120
- response = None
121
- if m:
122
- response = m.group(1).strip()
123
- else:
124
- # The model might not generate the "### End" sequence before reaching the max tokens. In this case, return
125
- # everything after "### Response:".
126
- m = re.search(r"#+\s*Response:\s*(.+)", decoded, flags=re.DOTALL)
127
- if m:
128
- response = m.group(1).strip()
129
- else:
130
- pass
131
- return response
132
  ```
133
 
134
  ### Model Performance Metrics
 
50
 
51
  ## Usage
52
 
53
+ To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` and `accelerate` libraries installed.
54
+ From your terminal, run:
 
 
55
 
56
  ```python
57
+ pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"
 
 
 
 
 
 
 
58
  ```
59
 
60
+ The instruction following pipeline can be loaded using the `pipeline` function as shown below. This loads a custom `InstructionTextGenerationPipeline`
61
+ found in the model repo [here](https://huggingface.co/aisquared/dlite-v2-355m/blob/main/instruct_pipeline.py), which is why `trust_remote_code=True` is required.
62
+ Including `torch_dtype=torch.bfloat16` is generally recommended if this type is supported in order to reduce memory usage. It does not appear to impact output quality.
63
+ It is also fine to remove it if there is sufficient memory.
64
 
65
  ```python
66
+ from transformers import pipeline
67
+ import torch
68
 
69
+ generate_text = pipeline(model="aisquared/dlite-v2-355m", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
70
+ ```
71
 
72
+ You can then use the pipeline to answer instructions:
 
73
 
74
+ ```python
75
+ res = generate_text("Who was George Washington?")
76
+ print(res[0]["generated_text"])
77
  ```
78
 
79
+ Alternatively, if you prefer to not use `trust_remote_code=True` you can download [instruct_pipeline.py](https://huggingface.co/aisquared/dlite-v2-355m/blob/main/instruct_pipeline.py),
80
+ store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer:
81
 
82
  ```python
83
+ from instruct_pipeline import InstructionTextGenerationPipeline
84
+ from transformers import AutoModelForCausalLM, AutoTokenizer
85
+ import torch
86
+
87
+ tokenizer = AutoTokenizer.from_pretrained("aisquared/dlite-v2-355m", padding_side="left")
88
+ model = AutoModelForCausalLM.from_pretrained("aisquared/dlite-v2-355m", device_map="auto", torch_dtype=torch.bfloat16)
89
+
90
+ generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ```
92
 
93
  ### Model Performance Metrics