Update README.md
Browse files
README.md
CHANGED
@@ -87,4 +87,9 @@ Keeping this in mind:
|
|
87 |
- The exact answer is always important and is always a few tokens. Hence, we do not mask the labels or input tokens for the answer value.
|
88 |
- Rarely, we ignore the rationale labels entirely, such that the model is only pushed to learn what leads to the best answer.
|
89 |
|
90 |
-
## Results
|
|
|
|
|
|
|
|
|
|
|
|
87 |
- The exact answer is always important and is always a few tokens. Hence, we do not mask the labels or input tokens for the answer value.
|
88 |
- Rarely, we ignore the rationale labels entirely, such that the model is only pushed to learn what leads to the best answer.
|
89 |
|
90 |
+
## Results
|
91 |
+
|
92 |
+
I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
|
93 |
+
|
94 |
+
I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking), and once with SFT.
|
95 |
+
|