llmware
/

bling-1b-0.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

doberst commited on Nov 4, 2023

Commit

e028a91

•

1 Parent(s): 492f901

Update README.md

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -51,15 +51,16 @@ without the need for a lot of complex instruction verbiage - provide a text pass
 ### Benchmark Tests
-Evaluated against the benchmark test:   [RAG-Instruct-Benchmark-Tester][https://www.huggingface.co/llmware/rag_instruct_benchmark_tester]
 Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
---Score:  73.25 correct out of 100
 --Not Found Classification:  17.5%
 --Boolean:  29%
 --Math/Logic:  0%
 --Complex Questions (1-5):  1 (Low)
 --Summarization Quality (1-5):  1 (Coherent, extractive)
 For test run results, please see the files ("core_rag_test" and "answer_sheet" in the repo).
@@ -70,7 +71,10 @@ For test run results, please see the files ("core_rag_test" and "answer_sheet" i
 Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
-This model can be used effective for quick testing and will be generally accurate in relatively simple extractive Q&A and basic summarization.
 ## How to Get Started with the Model

 ### Benchmark Tests
+Evaluated against the benchmark test:   [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/llmware/rag_instruct_benchmark_tester)
 Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
+--**Accuracy Score**:  **73.25** correct out of 100
 --Not Found Classification:  17.5%
 --Boolean:  29%
 --Math/Logic:  0%
 --Complex Questions (1-5):  1 (Low)
 --Summarization Quality (1-5):  1 (Coherent, extractive)
+--Hallucinations:  No hallucinations observed.
 For test run results, please see the files ("core_rag_test" and "answer_sheet" in the repo).
 Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
+This model can be used effective for quick "on laptop" testing and will be generally accurate in relatively simple extractive Q&A and basic summarization.
+For higher performing models, please see the larger models in the BLING series, starting at 1.3B-1.4B up to 3B.
+Note:  this was the smallest model that we were able to train to consistently recognize Q&A and RAG instructions.
 ## How to Get Started with the Model