LaierTwoLabs
commited on
Commit
•
53e2350
1
Parent(s):
b9164c1
Update README.md
Browse files
README.md
CHANGED
@@ -204,16 +204,13 @@ original source of training data here :
|
|
204 |
## Evaluation
|
205 |
|
206 |
|
207 |
-
<!-- This section describes the evaluation protocols and provides the results.
|
|
|
208 |
Model was evaluated using the Bitcoin Maximalism benchmark; an open source benchmark that was developed internally by the Spirit of Satoshi team to effectively evaluate the Bitcoin-related capabilities of a LLM.
|
209 |
Responses to each benchmark question were generated from the models being evaluated, and GPT4 was used to assess whether the responses provided by the models matched the expected answers.
|
210 |
-
If so, the model received a point towards the overall topic score.
|
211 |
-
Scores were summed on a per-topic basis for model comparison and charted for easy comparison between models to gauge performance.
|
212 |
-
|
213 |
-
|
214 |
|
215 |
|
216 |
-
#### Testing Data
|
217 |
|
218 |
|
219 |
<!-- This should link to a Dataset Card if possible. -->
|
|
|
204 |
## Evaluation
|
205 |
|
206 |
|
207 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
208 |
+
|
209 |
Model was evaluated using the Bitcoin Maximalism benchmark; an open source benchmark that was developed internally by the Spirit of Satoshi team to effectively evaluate the Bitcoin-related capabilities of a LLM.
|
210 |
Responses to each benchmark question were generated from the models being evaluated, and GPT4 was used to assess whether the responses provided by the models matched the expected answers.
|
|
|
|
|
|
|
|
|
211 |
|
212 |
|
213 |
+
#### Benchmark Testing Data
|
214 |
|
215 |
|
216 |
<!-- This should link to a Dataset Card if possible. -->
|