elly99 commited on
Commit
829c8ff
·
verified ·
1 Parent(s): 62072c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -267
README.md CHANGED
@@ -35,146 +35,14 @@ language:
35
  datasets:
36
  - pubmed
37
  - arxiv
38
- - openalex
39
- - zenodo
40
- metrics:
41
- - semantic-score
42
- - ethical-audit
43
- ---
44
-
45
-
46
-
47
 
48
 
49
- # MarCognity-AI
50
- **A research framework for reflective and epistemically transparent AI systems**
51
- ---
52
 
53
- [![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue)](https://www.apache.org/licenses/LICENSE-2.0)
54
 
55
  ---
56
-
57
- ## Table of Contents
58
-
59
- - [Overview](#overview)
60
- - [Research Motivation](#research-motivation)
61
- - [Modules and Functions](#modules-and-functions)
62
- - [Core Capabilities](#core-capabilities)
63
- - [Early Community Interactions(Non-Endorsement)](#early-community-interactions(non-endorsement))
64
- - [Official Publication and Citation](#official-publication-and-citation)
65
- - [Structural Limitation & Research Scope](#structural-limitation-&-research-scope)
66
- - [Integrated AI Models](#integrated-ai-models)
67
-
68
- ---
69
-
70
- ## Overview
71
-
72
- MarCognity-AI is a modular open-source research framework designed to investigate structural limitations of LLM-based metacognition and introduce explicit epistemic verification layers.
73
-
74
- Rather than simply generating responses, the system:
75
- - Produces structured outputs
76
- - Evaluates semantic coherence
77
- - Verifies claims against retrieved sources
78
- - Stores semantic memory
79
-
80
- Generates structured epistemic reports
81
-
82
- The goal is not to “improve answers,” but to analyze the structural fracture between linguistic coherence and epistemic awareness in large language models.
83
-
84
- ---
85
- ## Research Motivation
86
-
87
- Large Language Models optimize linguistic probability — not factual truth.
88
-
89
- MarCognity-AI investigates the following core question:
90
-
91
- Can epistemic uncertainty be made explicit within an LLM-based system?
92
-
93
- This framework does not claim to solve LLM hallucinations.
94
- Instead, it exposes and documents the failure modes of artificial metacognition in a reproducible way.
95
-
96
- The following cognitive architecture is composed of independent modules.
97
-
98
- ## Modules and Functions
99
-
100
- | Module | Function |
101
- |--------------------------|---------------------------------------------------|
102
- | Problem Classification | Automatic input type detection |
103
- | Academic Prompting | Structured multidisciplinary prompting |
104
- | Scientific Retrieval | Asynchronous retrieval from open-access sources |
105
- | Semantic Evaluation | Logical and semantic scoring of responses |
106
- | Skeptical Agent | Claim-by-claim verification against sources |
107
- | FAISS Memory | Archiving and comparison of past outputs |
108
- | Cognitive Visualization | Structured conceptual representation |
109
-
110
-
111
- ## Core Capabilities
112
-
113
- - LLM-assisted scientific generation
114
- - Source retrieval and integration (arXiv, PubMed, Zenodo, OpenAlex)
115
- - Multilevel metacognitive evaluation
116
- - Sentence-level epistemic verification
117
- - Ethical risk and bias analysis
118
- - Persistent semantic memory (FAISS)
119
- - Markdown-exportable reflective reports
120
-
121
- ---
122
-
123
- ## Structural Limitation & Research Scope
124
-
125
- MarCognity-AI is an exploratory research framework and is not intended for production use.
126
-
127
- During development, a recurring structural limitation emerged: LLM-based metacognitive layers reliably optimize for linguistic coherence but fail to surface epistemic uncertainty as an explicit signal.
128
-
129
- In practice, the system can evaluate how an answer is written (clarity, structure, semantic alignment), yet it cannot inherently determine whether the underlying claims are genuinely known, verifiable, or epistemically justified. The model can express that a response is unclear, but not that it lacks grounded knowledge.
130
-
131
- This collapse between linguistic coherence and epistemic awareness is not treated as a bug to be fixed, but as a structural fracture to be studied. The purpose of this framework is to expose, analyze, and document this limitation in a reproducible way.
132
-
133
- The demo and cognitive journal included in this repository are designed to make this failure mode observable — not to present a solved system.
134
-
135
- ---
136
-
137
- ## Early Community Interactions (Non-Endorsement)
138
-
139
- A discussion was opened regarding the semantic mapping layer.
140
- Community members from Hugging Face and related model discussions engaged technically with the proposal.
141
-
142
- You can explore the original threads and responses here:
143
- 🔗 [Hugging Face Discussion](https://huggingface.co/elly99/MarCognity-AI/discussions)
144
- 🔗 [DeepSeek Community Thread](https://huggingface.co/elly99/MarCognity-AI/discussions)
145
- 🔗 [Google org Response Snapshot](https://huggingface.co/google/gemma-2b-it/discussions/70#68ecace9e79b11c589bcead9)
146
-
147
- ---
148
-
149
- # Cross-Domain Epistemic Benchmark
150
-
151
- To evaluate the epistemic behavior of the architecture, a cross-domain benchmark was conducted across eight scientific and technical domains.
152
-
153
- ## Domains Included
154
-
155
- - Medicine
156
- - Neuroscience
157
- - Biology
158
- - Statistics
159
- - Linguistics
160
- - Computer Science
161
- - Physics
162
- - Law
163
-
164
- The benchmark consists of **72 evaluation tasks (9 per domain)**.
165
-
166
- ## Evaluated Configurations
167
-
168
- Two configurations were evaluated:
169
-
170
- - a **baseline large language model (LLM)** operating without epistemic verification
171
- - the **MarCognity-AI architecture**, which integrates the metacognitive cycle and the Skeptical Agent
172
-
173
- Each response generated by the two systems was evaluated using a **structured prompt-based epistemic assessment protocol**, applied by an independent LLM acting as evaluator.
174
-
175
- # MarCognity-AI
176
  **A modular framework for structured analysis and source-grounded verification in LLM-based systems**
177
- ---
178
 
179
  [![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue)](https://www.apache.org/licenses/LICENSE-2.0)
180
 
@@ -303,8 +171,7 @@ Two configurations were evaluated:
303
 
304
  Each response generated by the two systems was evaluated using a **structured prompt-based epistemic assessment protocol**, applied by an independent LLM acting as evaluator.
305
 
306
- The use of an LLM as independent evaluator introduces a known methodological limitation: the evaluator may share epistemic biases with the evaluated system.
307
- This constraint is acknowledged as a structural open problem in the field of LLM evaluation and is not specific to this framework.
308
 
309
  ### Epistemic Reliability Metrics
310
 
@@ -357,7 +224,6 @@ Conceptual analysis of the irreducible uncertainty observed in the benchmark.
357
 
358
  ---
359
 
360
-
361
  ### 📚 Official Publication and Citation
362
 
363
  The official version of the code and the full research paper have been permanently archived on Zenodo and are citable using their Digital Object Identifier (DOI).
@@ -411,7 +277,7 @@ It is intended for inspection and reproducibility, not interactive deployment.
411
 
412
  | Integrated Models | License | Main Restrictions |
413
  |--------------------------------------------------------------|--------------------------------------|----------------------------------------------------------------------|
414
- | meta-llama/llama-4-maverick-17b-128e-instruct | LLaMA 4 Community License (Meta) | Research and application use allowed; must comply with Meta’s AUP |
415
  | allenai/specter | Apache 2.0 | Free for commercial use with attribution |
416
  | ktrapeznikov/scibert_scivocab_uncased_squad_v2 | Apache 2.0 | Free for commercial use with attribution |
417
  | Helsinki-NLP (OPUS-MT models on HuggingFace) | CC-BY-4.0 | Free use with mandatory citation |
@@ -447,133 +313,4 @@ Contributions are welcome! If you have additional examples or improvements, plea
447
 
448
 
449
 
450
- ## Epistemic Reliability Metrics
451
-
452
- - Epistemic Score
453
- - Hallucination Exposure Rate
454
- - Evidence Support Rate
455
- - Overconfidence Index
456
- - Cautious Response Ratio
457
- - Contradiction Rate
458
- - Claim Verification Accuracy
459
-
460
- Benchmark tasks, evaluation prompts, and results are available in the `/benchmark` directory.
461
-
462
- ## Task Generation Pipeline
463
-
464
- Benchmark tasks were generated using domain-specific topic files processed by the MarCognity system.
465
-
466
- The system extracted topic names and generated explanatory scientific questions based on those topics.
467
-
468
- The generated questions were then manually reviewed and curated to ensure clarity, conceptual diversity, and domain relevance.
469
-
470
- The final benchmark tasks are available in the `/benchmark_tasks` directory.
471
-
472
- ---
473
- ### Failure Analysis
474
-
475
- A qualitative analysis of representative failure cases is provided in:
476
-
477
- benchmark/failure_analysis
478
-
479
- The analysis identifies recurring epistemic failure patterns including:
480
-
481
- • Source ambiguity
482
- • Context loss during claim segmentation
483
- • Unauthorized inference
484
- • Evaluator false negatives
485
- • Semantic ambiguity
486
- • Incomplete corpus of knowledge
487
-
488
- These observations suggest the presence of an epistemic boundary in text-based verification systems.
489
-
490
- ---
491
-
492
-
493
- ### 📚 Official Publication and Citation
494
-
495
- The official version of the code and the full research paper have been permanently archived on Zenodo and are citable using their Digital Object Identifier (DOI).
496
-
497
- | **MarCognity-AI** | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17855185.svg)](https://doi.org/10.5281/zenodo.18913144) |
498
- |---|---|
499
- | **Permanent DOI** | `https://doi.org/10.5281/zenodo.18913144` |
500
- | **Access Publication** | [Full Research Paper (PDF) & Code (Zenodo)](https://doi.org/10.5281/zenodo.18913144) |
501
-
502
- ---
503
-
504
- ## Usage Examples
505
-
506
- ### Scientific Question
507
- **Input:** “Explain the role of chaperone proteins.”
508
- **Output:** Response + sources + semantic score + conceptual diagram
509
-
510
- ### Epistemic Verification Example
511
- Input: “Explain quantum entanglement.”
512
- Output:
513
-
514
- Generated response
515
-
516
- Claim-by-claim verification
517
-
518
- VERIFIED / EPISTEMIC FAILURE report
519
-
520
- Reasoning based on provided sources
521
-
522
- ---
523
- ### Quick Demo
524
-
525
- A step-by-step execution example is available in:
526
-
527
- `marcognity_demo.ipynb`
528
-
529
- The notebook illustrates:
530
- - Response generation
531
- - Retrieval integration
532
- - Claim-level verification
533
- - Epistemic reporting
534
-
535
- [Meta LLaMA 4 Community License](https://ai.meta.com/llama/license)
536
-
537
- It is intended for inspection and reproducibility, not interactive deployment.
538
-
539
- ---
540
-
541
-
542
- ## Integrated AI Models
543
-
544
- | Integrated Models | License | Main Restrictions |
545
- |--------------------------------------------------------------|--------------------------------------|----------------------------------------------------------------------|
546
- | meta-llama/llama-4-maverick-17b-128e-instruct | LLaMA 4 Community License (Meta) | Research and application use allowed; must comply with Meta’s AUP |
547
- | allenai/specter | Apache 2.0 | Free for commercial use with attribution |
548
- | ktrapeznikov/scibert_scivocab_uncased_squad_v2 | Apache 2.0 | Free for commercial use with attribution |
549
- | Helsinki-NLP (OPUS-MT models on HuggingFace) | CC-BY-4.0 | Free use with mandatory citation |
550
- | RandomForest Model | None (classic algorithm) | No license restrictions; depends on data used |
551
- | CrossEncoder (DeBERTa-based) | Varies (often MIT or Apache 2.0) | Free use if open license is respected |
552
-
553
- ---
554
-
555
-
556
- ## How to Contribute
557
-
558
- Got ideas, suggestions, or want to improve a feature?
559
-
560
- 1. Fork the repository
561
- 2. Create a branch (`git checkout -b improvement`)
562
- 3. Modify `.py` or `.ipynb` files
563
- 4. To run this project, you need a Groq API key
564
- 5. Open a pull request with a clear description
565
-
566
- See the [CONTRIBUTING.md](Contributing.md) file for contribution guidelines.
567
-
568
- ---
569
-
570
- ## License
571
-
572
- Released under the Apache 2.0 License.
573
- Third-party integrated models follow their respective licenses.
574
-
575
-
576
- Contributions are welcome! If you have additional examples or improvements, please feel free to open a pull request or report an issue.
577
-
578
-
579
 
 
35
  datasets:
36
  - pubmed
37
  - arxiv
 
 
 
 
 
 
 
 
 
38
 
39
 
 
 
 
40
 
 
41
 
42
  ---
43
+ ## MarCognity-AI
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  **A modular framework for structured analysis and source-grounded verification in LLM-based systems**
45
+
46
 
47
  [![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue)](https://www.apache.org/licenses/LICENSE-2.0)
48
 
 
171
 
172
  Each response generated by the two systems was evaluated using a **structured prompt-based epistemic assessment protocol**, applied by an independent LLM acting as evaluator.
173
 
174
+ The use of an LLM as independent evaluator introduces a known methodological limitation: the evaluator may share epistemic biases with the evaluated system. This constraint is acknowledged as a structural open problem in the field of LLM evaluation and is not specific to this framework.
 
175
 
176
  ### Epistemic Reliability Metrics
177
 
 
224
 
225
  ---
226
 
 
227
  ### 📚 Official Publication and Citation
228
 
229
  The official version of the code and the full research paper have been permanently archived on Zenodo and are citable using their Digital Object Identifier (DOI).
 
277
 
278
  | Integrated Models | License | Main Restrictions |
279
  |--------------------------------------------------------------|--------------------------------------|----------------------------------------------------------------------|
280
+ | meta-llama/llama-4-scout-17b-16e-instruct | LLaMA 4 Community License (Meta) | Research and application use allowed; must comply with Me |
281
  | allenai/specter | Apache 2.0 | Free for commercial use with attribution |
282
  | ktrapeznikov/scibert_scivocab_uncased_squad_v2 | Apache 2.0 | Free for commercial use with attribution |
283
  | Helsinki-NLP (OPUS-MT models on HuggingFace) | CC-BY-4.0 | Free use with mandatory citation |
 
313
 
314
 
315
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
316