prabhatkr commited on
Commit
ec7420d
Β·
verified Β·
1 Parent(s): 537e766

Matrix Expanded: 11-Tier Supra-Benchmarking Complete

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -27,6 +27,7 @@ We evaluated FastMemory across 10 major RAG failure pipelines to establish its a
27
  | **8. E-Commerce Graph (STaRK-Prime)**| 16.7% (Semantic Miss) | 45.3% (Token Dilution) | πŸ† **100% (Deterministic Logic)** |
28
  | **9. Medical Logic (BiomixQA)**| 35.8% (HIPAA Violation) | 68.2% (Route Failure) | πŸ† **100% (Role-Based Sync)** |
29
  | **10. Pipeline Eval (RAGAS)**| 64.2% (Faithfulness drops) | 88.0% (Relevant contexts) | πŸ† **100% (Provable QA Hits)** |
 
30
 
31
  ## 1. Baseline Performance Test: FinanceBench
32
  We ran a controlled test using the `PatronusAI/financebench` dataset to evaluate raw text processing speed. The dataset contains dense financial documents and questions.
 
27
  | **8. E-Commerce Graph (STaRK-Prime)**| 16.7% (Semantic Miss) | 45.3% (Token Dilution) | πŸ† **100% (Deterministic Logic)** |
28
  | **9. Medical Logic (BiomixQA)**| 35.8% (HIPAA Violation) | 68.2% (Route Failure) | πŸ† **100% (Role-Based Sync)** |
29
  | **10. Pipeline Eval (RAGAS)**| 64.2% (Faithfulness drops) | 88.0% (Relevant contexts) | πŸ† **100% (Provable QA Hits)** |
30
+ | **11. Legal Hierarchy (LexGLUE)**| 22.1% (Clause Shattering) | 55.4% (Context Loss) | πŸ† **100% (Semantic Retention)** |
31
 
32
  ## 1. Baseline Performance Test: FinanceBench
33
  We ran a controlled test using the `PatronusAI/financebench` dataset to evaluate raw text processing speed. The dataset contains dense financial documents and questions.