Update README.md
Browse files
README.md
CHANGED
@@ -37,6 +37,16 @@ We found that [the "value retrieval" variant of NIAH](https://github.com/Arize-a
|
|
37 |
|
38 |
This model does achieve 100% all-green results on value retrieval but less than satisfactory results on the original version.
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## Usage
|
42 |
|
|
|
37 |
|
38 |
This model does achieve 100% all-green results on value retrieval but less than satisfactory results on the original version.
|
39 |
|
40 |
+
### Reproduce
|
41 |
+
|
42 |
+
[360k/niah](https://github.com/Qihoo360/360zhinao/blob/main/360k/niah/) generates the raw results.
|
43 |
+
|
44 |
+
The score for value retrieval NIAH is calculated on-the-fly when generating the raw results, while the actual score of original and Chinese NIAH is calculated in [360k/plot](https://github.com/Qihoo360/360zhinao/blob/main/360k/plot/).
|
45 |
+
|
46 |
+
For the original version, 100% score is given if the regular expression `sandwich.+?dolores.+?sunny` matches the model output, and edit distance otherwise.
|
47 |
+
|
48 |
+
For the Chinese version, 100% score is given if `刘秀` is present in the model output, and edit distance otherwise. For the English-biased llama3 models this may not be perfect.
|
49 |
+
|
50 |
|
51 |
## Usage
|
52 |
|