DavidGF commited on
Commit
30ed549
1 Parent(s): d4afd2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -91,6 +91,19 @@ Our results, with `result < 0.1, %:` being well below 0.9, indicate that our dat
91
 
92
  *All benchmarks were performed with a sliding window of 4096. New Benchmarks with Sliding Window null coming soon
93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  ## Disclaimer
95
  We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
96
  However, we cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided.
 
91
 
92
  *All benchmarks were performed with a sliding window of 4096. New Benchmarks with Sliding Window null coming soon
93
 
94
+ **German RAG LLM Evaluation**
95
+ corrected result after FIX: https://github.com/huggingface/lighteval/pull/171
96
+ ```
97
+ | Task |Version|Metric|Value| |Stderr|
98
+ |------------------------------------------------------|------:|------|----:|---|-----:|
99
+ |all | |acc |0.975|± |0.0045|
100
+ |community:german_rag_eval:_average:0 | |acc |0.975|± |0.0045|
101
+ |community:german_rag_eval:choose_context_by_question:0| 0|acc |0.953|± |0.0067|
102
+ |community:german_rag_eval:choose_question_by_context:0| 0|acc |0.998|± |0.0014|
103
+ |community:german_rag_eval:context_question_match:0 | 0|acc |0.975|± |0.0049|
104
+ |community:german_rag_eval:question_answer_match:0 | 0|acc |0.974|± |0.0050|
105
+ ```
106
+
107
  ## Disclaimer
108
  We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
109
  However, we cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided.