instruction-pretrain
commited on
Commit
•
3d05718
1
Parent(s):
da43c31
Update README.md
Browse files
README.md
CHANGED
@@ -210,16 +210,15 @@ We simply discard the system prompts.
|
|
210 |
|
211 |
**To put it all together, the text before tokenization looks like this:**
|
212 |
|
213 |
-
|
214 |
-
|
215 |
-
or
|
216 |
-
|
217 |
-
`instruction_augmented_text = "<|begin_of_text|>{instruction augmented text}<|end_of_text|>"`
|
218 |
|
|
|
|
|
219 |
Then, for tokenization, you don't need to add BOS and EOS token ids. The tokenization code looks like this:
|
220 |
-
|
221 |
-
|
222 |
-
|
223 |
|
224 |
## Citation
|
225 |
If you find our work helpful, please cite us:
|
|
|
210 |
|
211 |
**To put it all together, the text before tokenization looks like this:**
|
212 |
|
213 |
+
```python
|
214 |
+
general_instruction_response_text = "<|begin_of_text|>{question} {response}<|end_of_text|>"
|
|
|
|
|
|
|
215 |
|
216 |
+
instruction_augmented_text = "<|begin_of_text|>{instruction augmented text}<|end_of_text|>"
|
217 |
+
```
|
218 |
Then, for tokenization, you don't need to add BOS and EOS token ids. The tokenization code looks like this:
|
219 |
+
```python
|
220 |
+
text_ids = tokenizer(text, add_special_tokens=False, **kwargs).input_ids
|
221 |
+
```
|
222 |
|
223 |
## Citation
|
224 |
If you find our work helpful, please cite us:
|