Update README.md
Browse files
README.md
CHANGED
@@ -38,11 +38,11 @@ There is a larger variant called [SteamSHP-XL](https://huggingface.co/stanfordnl
|
|
38 |
The input text should be of the format:
|
39 |
|
40 |
```
|
41 |
-
POST: { the context, such as the 'history' column in SHP }
|
42 |
|
43 |
-
RESPONSE A: { first possible continuation }
|
44 |
|
45 |
-
RESPONSE B: { second possible continuation }
|
46 |
|
47 |
Which response is better? RESPONSE
|
48 |
```
|
@@ -74,9 +74,9 @@ When trying to cram an example into 512 tokens, we recommend truncating the cont
|
|
74 |
If you want to use SteamSHP-Large as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
|
75 |
|
76 |
```
|
77 |
-
POST: { the context, such as the 'history' column in SHP }
|
78 |
|
79 |
-
RESPONSE A: { continuation }
|
80 |
|
81 |
RESPONSE B: .
|
82 |
|
|
|
38 |
The input text should be of the format:
|
39 |
|
40 |
```
|
41 |
+
POST: { the context, such as the 'history' column in SHP (not containing any newlines \n) }
|
42 |
|
43 |
+
RESPONSE A: { first possible continuation (not containing any newlines \n) }
|
44 |
|
45 |
+
RESPONSE B: { second possible continuation (not containing any newlines \n) }
|
46 |
|
47 |
Which response is better? RESPONSE
|
48 |
```
|
|
|
74 |
If you want to use SteamSHP-Large as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
|
75 |
|
76 |
```
|
77 |
+
POST: { the context, such as the 'history' column in SHP (not containing any newlines \n) }
|
78 |
|
79 |
+
RESPONSE A: { continuation (not containing any newlines \n) }
|
80 |
|
81 |
RESPONSE B: .
|
82 |
|