diff --git a/.gitattributes b/.gitattributes
index 1dd84a09ea175a680c0cd852ec0e787bcaaed364..a6344aac8c09253b3b630fb776ae94478aa0275b 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -33,5 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
-Genome[[:space:]]Logic[[:space:]]Modeling[[:space:]]Project[[:space:]](GLMP)[[:space:]]-[[:space:]]a[[:space:]]Hugging[[:space:]]Face[[:space:]]Space[[:space:]]by[[:space:]]garywelz.pdf filter=lfs diff=lfs merge=lfs -text
-Programming[[:space:]]Framework[[:space:]]for[[:space:]]Systematic[[:space:]]Analysis[[:space:]]-[[:space:]]a[[:space:]]Hugging[[:space:]]Face[[:space:]]Space[[:space:]]by[[:space:]]garywelz.pdf filter=lfs diff=lfs merge=lfs -text
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..769132da964b9ffafb1eb2d72a22759340d5588f
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,3 @@
+
+# HF rejects raw PDFs in git push; host PDFs elsewhere or use Xet
+*.pdf
diff --git a/ARXIV_MATH_AREAS_TODO.md b/ARXIV_MATH_AREAS_TODO.md
new file mode 100644
index 0000000000000000000000000000000000000000..0fb962d1585ce4819e32d17759f13e0ba3d96fe5
--- /dev/null
+++ b/ARXIV_MATH_AREAS_TODO.md
@@ -0,0 +1,122 @@
+# Mathematics Database — arXiv Subject Areas To-Do
+
+A prioritized list of arXiv mathematics subject areas to add for a more complete collection, aligned with [arXiv math taxonomy](https://arxiv.org/category_taxonomy).
+
+---
+
+## Current Coverage (What We Have)
+
+| Domain | Subcategories | arXiv codes covered | Gaps |
+|--------|---------------|---------------------|------|
+| **Algebra** | abstract_algebra, linear_algebra, category_theory | math.GR, math.RA, math.CT, math.AC, math.AG, math.QA | Commutative algebra, Algebraic geometry, Representation theory, Quantum algebra |
+| **Analysis** | calculus_analysis | math.CA, math.CV, math.DS, math.FA, math.AP, math.NA, math.SP | Complex analysis, Functional analysis, PDEs, Numerical analysis, Spectral theory |
+| **Geometry & Topology** | geometry_topology | math.GT, math.AT, math.DG, math.GN, math.MG, math.SG | Metric geometry, Symplectic geometry (light) |
+| **Number Theory** | number_theory | math.NT | ✓ Good |
+| **Discrete & Logic** | discrete_mathematics, foundations | math.CO, math.LO | ✓ Good |
+| **Applied & Other** | bioinformatics, statistics_probability | math.GM, math.ST | Statistics/Probability empty |
+
+---
+
+## To-Do List: Subject Areas to Add (Near Term)
+
+### Priority 1 — High Impact, Partially Covered or Empty
+
+| # | arXiv Code | Subject Area | Notes | Suggested Subcategory |
+|---|------------|--------------|-------|------------------------|
+| 1 | math.ST | **Statistics & Probability Theory** | 0 charts currently; foundational for applied math | `statistics_probability` (exists, populate) |
+| 2 | math.PR | **Probability** | CLT, stochastic processes, SDEs; distinct from statistics | merge into `statistics_probability` or add `probability` |
+| 3 | math.CV | **Complex Variables** | Holomorphic functions, residues, conformal maps; partially in calculus_analysis | add `complex_analysis` or extend calculus_analysis |
+| 4 | math.FA | **Functional Analysis** | Banach spaces, Hilbert spaces, distributions | add to calculus_analysis or new `functional_analysis` |
+| 5 | math.NA | **Numerical Analysis** | Newton-Raphson, bisection exist; add quadrature, linear solvers, ODE solvers | extend calculus_analysis or add `numerical_analysis` |
+| 6 | math.AG | **Algebraic Geometry** | Varieties, schemes, moduli; major area | add `algebraic_geometry` or extend abstract_algebra |
+| 7 | math.RT | **Representation Theory** | Representations of groups, Lie algebras | add `representation_theory` or extend abstract_algebra |
+
+### Priority 2 — Core Pure Math Gaps
+
+| # | arXiv Code | Subject Area | Notes | Suggested Subcategory |
+|---|------------|--------------|-------|------------------------|
+| 8 | math.AC | **Commutative Algebra** | Rings, ideals, Noetherian; differs from Ring Theory (noncommutative focus) | add `commutative_algebra` |
+| 9 | math.AP | **Analysis of PDEs** | Existence, uniqueness, qualitative dynamics | add `partial_differential_equations` or extend analysis |
+| 10 | math.DG | **Differential Geometry** | Curves, surfaces, Riemannian; some in geometry_topology | ensure distinct charts for differential geometry |
+| 11 | math.SP | **Spectral Theory** | Schrödinger operators, spectral analysis | add to analysis or `spectral_theory` |
+| 12 | math.SG | **Symplectic Geometry** | Hamiltonian systems, symplectic manifolds | extend geometry_topology |
+| 13 | math.MG | **Metric Geometry** | Euclidean, hyperbolic, discrete geometry | extend geometry_topology |
+
+### Priority 3 — Advanced / Specialized
+
+| # | arXiv Code | Subject Area | Notes | Suggested Subcategory |
+|---|------------|--------------|-------|------------------------|
+| 14 | math.OA | **Operator Algebras** | C*-algebras, von Neumann algebras | add `operator_algebras` |
+| 15 | math.KT | **K-Theory and Homology** | Algebraic/topological K-theory | add `k_theory` or extend algebraic topology |
+| 16 | math.QA | **Quantum Algebra** | Quantum groups, operads | extend abstract_algebra |
+| 17 | math.OC | **Optimization and Control** | Linear programming, optimal control | add `optimization` |
+| 18 | math.IT | **Information Theory** | Coding, entropy, channel capacity | add `information_theory` |
+| 19 | math.MP | **Mathematical Physics** | Rigorous formulations of physical theories | add `mathematical_physics` |
+| 20 | math.HO | **History and Overview** | Biographies, education, philosophy | optional `history_overview` |
+
+### Priority 4 — Already in Expansion Plan
+
+These are in [MATHEMATICS_DATABASE_EXPANSION_PLAN.md](./MATHEMATICS_DATABASE_EXPANSION_PLAN.md):
+
+- **Complex Analysis** (math.CV) — 4 charts planned
+- **Landmark Theorems** — FLT, Poincaré, Riemann
+- **Formal Verification** — Lean, Coq
+- **AI Mathematics** — AlphaProof, AlphaGeometry
+
+---
+
+## Suggested Implementation Order
+
+### Phase A (1–2 weeks): Fill Empty & High-Impact
+1. **Statistics & Probability** — Kolmogorov axioms, Bayes, CLT (3–5 charts)
+2. **Complex Analysis** — Cauchy, residues, conformal maps (4 charts per expansion plan)
+3. **Functional Analysis** — Banach/Hilbert spaces basics (2–3 charts)
+
+### Phase B (2–4 weeks): Algebra & Geometry Gaps
+4. **Algebraic Geometry** — Varieties, schemes intro (2–3 charts)
+5. **Representation Theory** — Group representations, characters (2–3 charts)
+6. **Numerical Analysis** — Quadrature, solvers, ODE methods (3–4 charts)
+
+### Phase C (4–6 weeks): PDEs, Operator Theory, Applied
+7. **PDEs** — Heat, wave, Laplace; existence/uniqueness (2–3 charts)
+8. **Operator Algebras** — C*-algebras intro (1–2 charts)
+9. **Optimization** — Linear programming, simplex (2 charts)
+10. **Mathematical Physics** — Lagrangian/Hamiltonian mechanics (2 charts)
+
+---
+
+## Metadata Updates Required
+
+When adding new subcategories:
+
+1. Add to `metadata.json` → `subcategoryCounts`
+2. Add to `metadata.json` → `subcategoryToArxiv`
+3. Add to `metadata.json` → `domainHierarchy` (assign to algebra, analysis, geometry_topology, or applied)
+4. Run `build-graph-data.js` to update Whole of Mathematics
+5. Update upload script if new process directories are created
+
+---
+
+## Summary: arXiv Math Codes Not Yet Represented
+
+| Code | Area | Priority |
+|------|------|----------|
+| math.ST | Statistics Theory | 1 |
+| math.PR | Probability | 1 |
+| math.CV | Complex Variables | 1 |
+| math.FA | Functional Analysis | 1 |
+| math.NA | Numerical Analysis | 1 |
+| math.AG | Algebraic Geometry | 1 |
+| math.RT | Representation Theory | 1 |
+| math.AC | Commutative Algebra | 2 |
+| math.AP | Analysis of PDEs | 2 |
+| math.SP | Spectral Theory | 2 |
+| math.OA | Operator Algebras | 3 |
+| math.KT | K-Theory | 3 |
+| math.QA | Quantum Algebra | 3 |
+| math.OC | Optimization & Control | 3 |
+| math.IT | Information Theory | 3 |
+| math.MP | Mathematical Physics | 3 |
+| math.HO | History & Overview | 4 |
+
+**Well covered:** math.NT, math.CO, math.LO, math.GR, math.RA, math.CT, math.CA, math.GT, math.AT, math.DS (via complex dynamics)
diff --git a/ATTRIBUTION_SCHEMA.md b/ATTRIBUTION_SCHEMA.md
new file mode 100644
index 0000000000000000000000000000000000000000..45f48a991452cea990e413953a059cf857a643ed
--- /dev/null
+++ b/ATTRIBUTION_SCHEMA.md
@@ -0,0 +1,39 @@
+# Mathematics Database — Attribution Schema
+
+Charts in the Mathematics Processes Database may include optional attribution metadata for academic transparency and citation.
+
+## Schema
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `primary` | string | Primary author(s) or source (e.g., "Kurt Gödel", "Claude Shannon") |
+| `contributors` | string[] | Additional contributors (optional) |
+| `publication` | string | Title of publication or paper |
+| `year` | string | Year of publication |
+| `doi` | string | DOI URL (e.g., "https://doi.org/...") |
+| `url` | string | External URL (Wikipedia, arXiv, etc.) |
+
+## Implementation
+
+Attribution is embedded in chart HTML via a "Cite" badge in the header-meta area. Hovering over the badge reveals a popover with the full attribution details. Charts using this schema include:
+
+- Gödel Incompleteness Theorems
+- Schemes & Sheaves (Grothendieck)
+- Group Representations (Frobenius, Maschke)
+- Riemannian Geometry
+- ZFC Axioms
+- Shannon Entropy
+- C*-Algebras (Gelfand–Naimark)
+
+## Example JSON
+
+```json
+{
+ "primary": "Kurt Gödel",
+ "contributors": [],
+ "publication": "Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I",
+ "year": "1931",
+ "doi": "https://doi.org/10.1007/BF01700692",
+ "url": "https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems"
+}
+```
diff --git a/GENERIC_PROCESSES_TO_UPDATE.md b/GENERIC_PROCESSES_TO_UPDATE.md
new file mode 100644
index 0000000000000000000000000000000000000000..77234e8b120c12ef1eb7dc27863675788f61a9a2
--- /dev/null
+++ b/GENERIC_PROCESSES_TO_UPDATE.md
@@ -0,0 +1,90 @@
+# Generic Processes Needing Real Content
+
+These processes use the generic template ("This X process visualization demonstrates... The flowchart shows...") and need to be replaced. **Use different approaches for different process types.**
+
+## Strategy by Process Type
+
+### 1. Algorithm flowcharts (like Binary Search)
+**Examples:** Binary Search (done), Cryptographic Algorithms, Numerical Methods
+
+**Approach:** Process-like flowcharts with:
+- Inputs (sorted array, search key)
+- Steps (initialize interval, compute middle, compare)
+- Decision diamonds (interval empty? key == A[mid]? key < A[mid]?)
+- Outputs (found index, not found)
+- Chart title: **"Algorithm Flowchart"** (do not use "GLMP 6-Color Scheme" in the title)
+
+**Reference:** [Binary Search](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/mathematics-processes-database/processes/discrete_mathematics/discrete_mathematics-binary-search.html) – O(log n) complexity
+
+**Candidates:** Add specific algorithms – e.g. RSA, Newton-Raphson, Sieve of Eratosthenes, Dijkstra – each as its own process flowchart.
+
+### 2. Axiom-theorem dependency graphs (like Euclid, Peano, Propositional Logic, Aristotle)
+**Examples:** Euclid Book I (done), Peano Arithmetic (done), Propositional Logic (done), Aristotle Syllogistic (done)
+
+**Approach:** Real mathematical development:
+- Axioms / definitions at the base
+- Theorems with explicit dependencies (arrows = "depends on")
+- Split into subgraphs for clarity (like Euclid Book I's 5 views)
+
+**Reference:** [Euclid Book I](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/mathematics-processes-database/processes/geometry_topology/geometry_topology-euclid-elements-book-i.html)
+
+**Candidates:**
+- **Group Theory** – done (43 nodes, 69 edges across 3 subcharts; Euclid-style layered dependencies)
+- **Ring Theory** – ring axioms → integral domain, polynomial rings
+- **Field Theory** – field axioms → extensions, algebraic closure
+- **Limit / Derivative / Integral** – ε-δ, limit laws, FTC, etc.
+- **Modular Arithmetic** – congruence, Fermat's little theorem, etc.
+- **Topology** – open sets, continuity, compactness
+- **Differential Geometry** – manifold, metric, curvature
+
+### 3. Axiomatic combinatorics (like Euclid Book I for counting)
+**Example:** Combinatorics (done)
+
+**Approach:** Axiomatic theory of combinatorics – definitions (factorial, sum/product principles) and theorems (permutations, combinations, binomial, pigeonhole, inclusion-exclusion) with dependency graph. Can be expanded to be more comprehensive like Euclid Book I.
+
+**Reference:** [Combinatorics](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/mathematics-processes-database/processes/geometry_topology/geometry_topology-combinatorics.html)
+
+---
+
+## Updated (with real content)
+- **Combinatorics** – Axiomatic counting theory (14 nodes, 15 edges)
+- **Binary Search** – Algorithm flowchart (already had real content)
+- **Sieve of Eratosthenes** – Prime Number Generation (10 nodes, 14 edges) ✓ Batch 1
+- **Newton-Raphson Method** – Numerical Methods (9 nodes, 11 edges) ✓ Batch 1
+- **Bisection Method** – Limit Calculation (8 nodes, 10 edges) ✓ Batch 2
+- **Extended Euclidean Algorithm** – Modular Arithmetic (6 nodes, 6 edges) ✓ Batch 2
+- **Dijkstra's Algorithm** – Graph Theory Algorithms (7 nodes, 8 edges) ✓ Batch 2
+- **RSA Algorithm** – Cryptographic Algorithms (7 nodes, 7 edges) ✓ Batch 3
+- **Simpson's Rule** – Integral Calculation (6 nodes, 5 edges) ✓ Batch 3
+- **Kruskal's Algorithm** – new (9 nodes, 12 edges) ✓ Batch 3
+- **AES Algorithm** – new (8 nodes, 8 edges) ✓ Batch 4
+- **Merge Sort** – new (7 nodes, 7 edges) ✓ Batch 4
+- **Prim's Algorithm** – new (9 nodes, 12 edges) ✓ Batch 4
+- **Quicksort** – new (6 nodes, 6 edges) ✓ Batch 5
+- **Breadth-First Search** – new (7 nodes, 8 edges) ✓ Batch 5
+- **Binary Search Tree Insert** – new (8 nodes, 9 edges) ✓ Batch 5
+- **Group Theory** – Axiom-theorem dependency graph (21 nodes, 29 edges across 3 subcharts) ✓
+
+## Need Updates (by type)
+
+### Algorithm flowcharts to create
+- DFS, Heap sort, etc.
+- Graph Theory Algorithms → Dijkstra, Kruskal, etc.
+
+### Axiom-theorem graphs to create (placeholders removed)
+- Field Theory, Ring Theory
+- Derivative, Integral, Limit Calculation
+- Modular Arithmetic, Diophantine Equations
+- Topology, Differential Geometry, Euclidean Geometry
+- Logic & Set Theory (or point to Propositional Logic)
+- Statistical Analysis (probability axioms → theorems)
+
+### Removed (generic placeholders deleted ✓)
+- Field Theory, Ring Theory, Derivative Calculation, Statistical Analysis, Logic & Set Theory
+- Differential Geometry, Euclidean Geometry, Topology, Diophantine Equations
+- Integral Calculation, Limit Calculation, Modular Arithmetic, Cryptographic Algorithms, Graph Theory Algorithms
+- Run `delete-generic-charts-from-gcs.sh` to remove from GCS; then `upload-mathematics-database-to-gcs.sh` for updated metadata
+
+### Duplicates (resolved ✓)
+- statistics_probability-aristotles-syllogism → removed (canonical: discrete_mathematics-aristotle-syllogistic)
+- statistics_probability-euclids-geometry → removed (canonical: geometry_topology-euclid-elements-*)
diff --git a/GLMP_Foundation.html b/GLMP_Foundation.html
deleted file mode 100644
index 739837f68ed7d6bf20e219f48dfb4e7ebd61b843..0000000000000000000000000000000000000000
--- a/GLMP_Foundation.html
+++ /dev/null
@@ -1,969 +0,0 @@
-
-
-
-
-
- Is the Genome Like a Computer Program?
-
-
-
-
-
Is the Genome Like a Computer Program?
-
Author: Gary Welz
-
Date: April 12, 2025
-
-
-
-
Abstract
-
This article revisits the metaphor of the genome as a computer program, a concept first proposed publicly by the author in 1995. Drawing on historical discussions in computational biology, including previously unpublished exchanges from the bionet.genome.chromosome newsgroup, we explore how the genome functions not merely as a passive database of genes but as an active, logic-driven computational system. The genome executes massively parallel processes—driven by environmental inputs, chemical conditions, and internal state—using a computational architecture fundamentally different from conventional computing. From early visual metaphors in Mendelian genetics to contemporary logic circuits in synthetic biology, this paper traces the historical development of computational models that express genomic logic, while critically examining both the utility and limitations of the program metaphor. We conclude that the genome represents a unique computational paradigm that could inform the development of novel computing architectures and artificial intelligence systems.
-
-
-
1. Introduction
-
Target Audience: This article is written for researchers and enthusiasts in computational biology, synthetic biology, artificial intelligence, and related fields. While some background in biology or computer science is helpful, we provide explanations and analogies to make the concepts accessible to interdisciplinary audiences.
-
-
Biological processes have often been described through metaphor: the cell as a factory, DNA as a blueprint, and most provocatively—the genome as a computer program. Unlike static descriptions, this metaphor opens the door to seeing life itself as computation: a dynamic process with inputs, logic conditions, iterative loops, subroutines, and termination conditions.
-
-
In 1995, the author explored this idea in an essay published in The X Advisor, proposing that gene regulation could be modeled as a logic program. That same year, in discussions on the bionet.genome.chromosome newsgroup, computational biologists including Robert Robbins of Johns Hopkins University developed this metaphor further, exploring profound differences between genomic and conventional computation. This article revisits and expands that vision through both historical analysis and modern advances in biology and AI.
-
-
As we will explore, the genome-as-program metaphor provides valuable insights but also requires us to stretch conventional computational thinking into new paradigms—ones that might ultimately inform the future of computing itself.
-
-
2. Historical Context
-
-
2.1 Early Visualizations of Biological Logic
-
The visualization of biological logic began with Gregor Mendel in the 19th century. Though his work predates formal computational thinking, Mendel's charts—showing ratios of inherited traits—used symbolic logic to track biological outcomes. Later, chromosome theory and operon models introduced control diagrams that represented genetic regulatory mechanisms.
-
-
2.1.1 Mendel's Punnett Square and Computational Logic
-
The Punnett square, named after British geneticist Reginald Punnett (1875-1967), represents one of the earliest systematic approaches to modeling genetic inheritance as a computational process. Punnett, a collaborator of William Bateson (1861-1926) who coined the term "genetics" and was a key figure in establishing genetics as a scientific discipline, developed this visualization method to predict the outcomes of genetic crosses. The square format provides a systematic way to compute all possible combinations of parental alleles, making it one of the first "genetic algorithms" in computational biology.
-
-
The Punnett square in Figure 1 demonstrates a monohybrid cross between two heterozygous parents (Aa × Aa). Each cell in the 2×2 grid represents a possible genotype outcome, with the probability of each outcome determined by the rules of Mendelian inheritance. This systematic enumeration of possibilities mirrors the truth table approach used in digital logic design, where all possible input combinations are explicitly listed to determine output states.
-
-
The computational logic underlying the Punnett square can be expressed through Boolean operations. Consider a simple genetic system where allele A is dominant and allele a is recessive. The phenotypic expression follows these logical rules:
-
-
Dominance Logic (OR operation):
- Phenotype = A OR A = Dominant trait
- This follows the logical rule: if either allele is A, the dominant phenotype is expressed.
-
-
Recessive Logic (AND operation):
- Phenotype = a AND a = Recessive trait
- This follows the logical rule: only if both alleles are a is the recessive phenotype expressed.
-
-
The Punnett square can be extended to more complex genetic systems. For example, a dihybrid cross (AaBb × AaBb) creates a 4×4 grid with 16 possible combinations, demonstrating how genetic complexity scales exponentially with the number of genes involved. This combinatorial explosion is a fundamental characteristic of genetic computation that distinguishes it from simple linear processes.
-
-
The logical structure of Mendelian inheritance can be formalized using truth tables, similar to those used in digital circuit design:
-
-
Truth Table for Dominant/Recessive Inheritance:
-
-
Allele 1
Allele 2
Genotype
Phenotype
Logic
-
A
A
AA
Dominant
1 OR 1 = 1
-
A
a
Aa
Dominant
1 OR 0 = 1
-
a
A
aA
Dominant
0 OR 1 = 1
-
a
a
aa
Recessive
0 AND 0 = 0
-
-
-
This truth table approach reveals that genetic inheritance operates through fundamental logical operations: OR for dominance (presence of dominant allele) and AND for recessiveness (absence of dominant alleles). These same logical operations form the basis of digital computation, establishing a direct parallel between genetic and computational logic.
-
-
The Punnett square method demonstrates several key principles of genetic computation: (1) systematic enumeration of possibilities, (2) probabilistic outcomes based on combinatorial rules, (3) hierarchical organization of genetic information, and (4) the ability to predict complex outcomes from simple rules. These principles would later be formalized in computational genetics and serve as the foundation for modern genetic algorithms and evolutionary computation.
-
-
-
-
Figure 1: Mendel's Punnett Square (1866)
-
- Punnett square showing a monohybrid cross (Aa × Aa) with the resulting 3:1 phenotypic ratio.
- Each cell represents a possible genotype outcome demonstrating Mendelian inheritance patterns.
- Source: Wikipedia Commons.
-
-
-
-
2.2 The Development of Computational Metaphors
-
The transition from Mendelian genetics to molecular biology in the mid-20th century marked a crucial evolution in computational thinking about biological systems. This period saw the emergence of sophisticated models that explicitly treated genetic regulation as a computational process, moving beyond simple inheritance patterns to complex regulatory networks.
-
-
2.2.1 The Lac Operon: A Biological Logic Circuit
-
In the 1960s, François Jacob and Jacques Monod's lac operon model introduced a logic gate–like system for regulating gene expression, paving the way for computational thinking in molecular biology. This revolutionary model showed how gene expression could be controlled through what resembled conditional logic, establishing the foundation for understanding genetic regulation as a computational process.
-
-
Jacob and Monod's work on the lac operon in Escherichia coli revealed a sophisticated regulatory system that operates through logical principles. The operon consists of three structural genes (lacZ, lacY, lacA) that are coordinately regulated by a single promoter and operator region. The system responds to two environmental inputs: the presence of lactose (the substrate) and the absence of glucose (the preferred energy source).
-
-
The computational logic of the lac operon can be expressed as a Boolean function:
-
Lac Operon Logic:
- Expression = (Lactose present) AND (Glucose absent)
- This logical function determines whether the operon is transcribed and the enzymes are produced.
-
-
The regulatory mechanism involves two key proteins: the lac repressor (encoded by lacI) and the catabolite activator protein (CAP). The lac repressor acts as a NOT gate—it binds to the operator and prevents transcription unless lactose is present. CAP acts as an AND gate—it enhances transcription only when glucose is absent. Together, these regulatory proteins implement a complex logical circuit that integrates multiple environmental signals.
-
-
The lac operon model demonstrated several key principles of biological computation: (1) the use of regulatory proteins as logic gates, (2) the integration of multiple inputs through logical operations, (3) the ability to respond to environmental conditions through conditional logic, and (4) the coordination of multiple genes through shared regulatory elements. These principles would later be formalized in computational models of gene regulatory networks and serve as the foundation for synthetic biology.
-
-
Jacob and Monod's work earned them the Nobel Prize in Physiology or Medicine in 1965, recognizing the profound implications of their discovery for understanding how genetic information is processed and regulated. Their model established the conceptual framework for viewing genetic regulation as a computational process, influencing generations of researchers in molecular biology and computational biology.
-
-
-
-
Figure 2: Jacob & Monod's Lac Operon Model (1961)
-
- Schematic representation of the lac operon regulatory system showing the interaction between
- regulatory proteins (lac repressor and CAP) and DNA elements (operator and promoter).
- The diagram illustrates the logical circuit structure of genetic regulation. Source: Jacob & Monod (1961).
-
-
-
-
2.3 The 1995 Bionet.Genome.Chromosome Discussions
-
In April 1995, during the early days of the internet and computational biology, a significant exchange on the bionet.genome.chromosome newsgroup explored the genome-as-program metaphor in depth. This discussion occurred at a pivotal moment when the Human Genome Project was gaining momentum and computational approaches to biology were emerging as a new paradigm. The author initiated this discussion by asking whether "an organism's genome can be regarded as a computer program" and whether its structure could be represented as "a flowchart with genes as objects connected by logical terms."
-
-
Robert Robbins of Johns Hopkins University responded with a comprehensive analysis that both supported and complicated the metaphor. While acknowledging the digital nature of the genetic code, Robbins highlighted that the genome functions more like "a mass storage device" with properties not shared by electronic counterparts, and that genomic programs operate with unprecedented levels of parallelism—"in excess of 10^18 parallel processes" in the human body. These discussions represented one of the earliest sophisticated analyses of the computational nature of genomic function and laid the groundwork for modern computational biology approaches.
-
-
2.4 The Author's 1995 Essay and Flowchart Model
-
In 1995, the author's speculative essay proposed treating gene expression as an executing program with logical flow. To demonstrate this concept, the author created one of the first computational flowcharts representing gene regulation—a diagram of the lac operon's β-galactosidase expression system that explicitly modeled genetic regulation using programming logic constructs (see Figure 1).
- The author's original 1995 computational flowchart representing the lac operon as a decision-tree program.
- Decision diamonds show conditional logic, rectangles show biological processes, and feedback loops
- show regulatory mechanisms. This was among the first attempts to model genetic regulation using
- computational constructs.
-
-
-
-
This original flowchart depicted the lac operon as a decision tree with conditional branches, feedback loops, and termination conditions—showing how the presence or absence of lactose and glucose created logical pathways leading to different outcomes for β-galactosidase production. The diagram used programming-style logic gates (decision diamonds for yes/no conditions, process rectangles for actions) to represent biological regulatory mechanisms, making explicit the parallel between genetic circuits and computer logic circuits.
-
-
The article was featured on a bioinformatics resource list curated by Professor Inge Jonassen at the University of Bergen, where it appeared alongside foundational references like PubMed, In Silico Biology, and DNA Computers.
-
-
2.4.1 Flowchart Examples in Computational Biology
-
The use of flowcharts to represent biological processes has become increasingly sophisticated in modern computational biology. Contemporary flowcharts often integrate multiple data types, computational algorithms, and biological processes into unified visual representations. These modern flowcharts serve as computational roadmaps, guiding researchers through complex analytical pipelines and decision-making processes.
-
-
Modern biological flowcharts typically include several key elements: (1) data input nodes representing experimental or computational data sources, (2) processing nodes showing analytical algorithms or computational methods, (3) decision points representing conditional logic based on statistical thresholds or biological criteria, (4) output nodes displaying results or predictions, and (5) feedback loops showing iterative refinement processes. This structure mirrors the computational architecture of modern bioinformatics pipelines.
-
-
The flowchart in Figure 3.1 demonstrates a fascinating example of how biological metaphors have been adopted in computer science. This figure, from a network security paper (Al-Haija et al., 2014), shows a genetic algorithm flowchart that uses biological terminology—"thrive," "extinct," "mutate"—to describe computational processes for intrusion detection. This illustrates the profound influence of biological thinking on computational approaches, even in domains far removed from biology itself.
-
-
The use of biological metaphors in this network security application is particularly revealing. The algorithm treats potential security threats as a "population" that can "thrive" (successful attacks), "go extinct" (failed attacks), or "mutate" (evolve new attack strategies). This demonstrates how the genome-as-program metaphor has influenced computational thinking across multiple disciplines, creating a shared language between biological and computational systems.
-
-
This example shows that the computational principles underlying biological systems—population dynamics, selection pressure, adaptation, and evolution—have become fundamental tools in computer science. The fact that network security researchers chose biological terminology to describe their algorithms underscores the intuitive appeal and explanatory power of biological metaphors in computational contexts.
-
-
-
-
Figure 3.1: Modern Genetic Algorithm Flowchart
-
- Contemporary flowchart showing the integration of genetic algorithms with artificial neural networks
- for computational biology applications. This example demonstrates modern computational approaches
- to biological problem-solving. Source: Al-Haija et al. (2014) - Used Genetic Algorithm for Support
- Artificial Neural Network in Intrusion Detection System.
-
-
-
-
2.5 Modern Visualization Systems
-
Since then, influential graphical systems have emerged for representing genomic data and processes: Martin Krzywinski's Circos (2009), Höhna's probabilistic phylogenetic networks (2014), Koutrouli's network visualizations (2020), and O'Donoghue's reviews (2018). These systems have grappled with the challenge of representing the multi-dimensional and massively parallel nature of genomic processes.
-
-
Martin Krzywinski's Circos visualization system represents a breakthrough in genomic data representation, using circular layouts to display complex multi-dimensional relationships between genomic regions. This innovative approach addresses the fundamental challenge of representing massive amounts of genomic data in an intuitive format, allowing researchers to identify patterns and relationships that would be impossible to see in linear representations. The circular layout enables the display of multiple data types simultaneously, making it an essential tool for modern comparative genomics and evolutionary studies. The Circos plot shows how different chromosomes (represented as segments around the circle) are connected by syntenic links (curved ribbons), revealing evolutionary relationships and structural variations that provide insights into genome evolution and organization.
-
-
-
-
Figure 4: Circos Genome Visualization (2009)
-
Circular layout showing chromosomes with syntenic links for comparative genomics. Source: Krzywinski et al. (2009).
-
-
-
Höhna et al.'s probabilistic phylogenetic networks represent a significant advancement in phylogenetic analysis, incorporating uncertainty and probabilistic relationships into evolutionary tree representations. This sophisticated approach acknowledges that biological processes are inherently stochastic and that our understanding of evolutionary relationships contains uncertainty. The model demonstrates how modern computational approaches can handle the inherent uncertainty in biological data, using probabilistic frameworks to represent evolutionary relationships rather than deterministic trees. This probabilistic approach has become essential for modern evolutionary biology and demonstrates how computational thinking has evolved to handle biological complexity, providing more realistic and nuanced representations of evolutionary processes.
Evolutionary relationships with uncertainty bands showing probabilistic phylogenetic analysis. Source: Höhna et al. (2014).
-
-
-
Koutrouli et al.'s biological network visualization demonstrates how modern computational biology uses graph theory to model complex biological systems. This sophisticated network representation shows genes as nodes and their interactions as edges, revealing the intricate web of regulatory relationships that govern cellular processes. This network-based approach represents a fundamental shift from linear, sequential thinking to systems-level understanding of biological complexity. The graph structure allows researchers to identify hubs, modules, and emergent properties that would be invisible in traditional linear representations, acknowledging that biological systems are inherently networked and that understanding requires analysis of the entire system rather than individual components.
-
-
-
-
Figure 6: Biological Network Visualization (2020)
-
Gene interaction networks and regulatory relationships using graph theory. Source: Koutrouli et al. (2020).
-
-
-
O'Donoghue et al.'s multi-dimensional biomedical data visualization represents a crucial advancement in handling the massive datasets generated by modern genomics. The heatmap format allows researchers to visualize complex multi-dimensional data in an intuitive color-coded format, where each cell represents the expression level of a gene under specific conditions. This approach enables the identification of expression patterns, clustering of genes with similar expression profiles, and the discovery of regulatory relationships across multiple conditions. The visualization demonstrates how computational methods can transform raw numerical data into meaningful biological insights, revealing patterns that would be impossible to detect through manual analysis. This approach has become essential for modern genomics, transcriptomics, and systems biology, enabling researchers to handle the complexity and scale of contemporary biological datasets.
-
-
-
-
Figure 7: Biomedical Data Visualization (2018)
-
Gene expression patterns using heatmap-based data representation. Source: O'Donoghue et al. (2018).
-
-
-
3. The Genome as a Mass Storage Device
-
Before we can understand genomic "programs," we must first understand the unique storage medium they operate on. As Robbins noted in 1995, the genome functions like a specialized mass storage device with properties unlike any electronic counterpart:
-
-
3.1 Associative Addressing vs. Physical Addressing
-
Unlike computer hard drives that store files at specific locations (like "sector 1, track 2"), the genome uses a smarter system called associative addressing. Think of it like a library where you find books by their content rather than their shelf position. As Robbins described it, "All addressing is associative, with multiple read heads scanning the device in parallel, looking for specific START LOADING HERE signals." This means the genome doesn't use absolute positions but rather characteristic patterns recognized by cellular machinery.
-
-
3.2 Linked-List Architecture
-
The genome resembles "a mass-storage device based on a linked-list architecture, rather than a physical platter." Information is encountered sequentially as cellular machinery moves along the DNA strand, with "pointers" in the form of regulatory sequences directing the machinery to relevant sections.
-
-
3.3 Redundant Organization with Variations
-
With diploid organisms possessing two sets of chromosomes, the genome exhibits built-in redundancy. However, as G. Dellaire noted in the 1995 discussions, mechanisms like imprinting and allelic silencing create a situation where "you only actually have one 'program' running" from certain loci, raising questions about "gene dosage" without clear parallels in conventional computing.
-
-
3.4 Multi-Level Encoding
-
Dellaire also highlighted that "the actual structure of genome and not just the linear sequence may 'encode' sets of instructions for the 'reading and accessing' of this genetic code." This insight presaged modern understanding of epigenetics, chromatin structure, and the "histone code" as additional layers of information storage and processing.
-
-
4. The Genome as a Logic-Driven Program
-
Despite the differences in storage medium, the genome operates with recognizable computational logic structures:
-
-
4.1 Core Computational Elements
-
The genome employs structures analogous to:
-
Bootloader: zygotic genome activation initiates development
- Conditional logic: expression dependent on chemical signals
- Loops: circadian cycles, metabolism, cell cycles
- Subroutines: growth, repair, reproduction
- Shutdown: apoptosis and programmed cell death
-
-
These resemble constructs such as IF-THEN, WHILE, SWITCH-CASE, and HALT in conventional computation.
-
-
4.2 Chemical Reactions as Computational Operations
-
At the molecular level, chemical reactions function as the basic operational units of genomic computation. These reactions operate through principles that can be understood as computational processes, though they differ fundamentally from digital computation in their analog, probabilistic nature.
-
-
Enzyme-Substrate Interactions as Logic Gates: Enzymes function as molecular logic gates, where the presence of specific substrates triggers catalytic reactions. These interactions follow Michaelis-Menten kinetics, creating sigmoidal response curves that resemble threshold logic functions. The enzyme's specificity for its substrate acts as a recognition mechanism, similar to how a logic gate responds only to specific input combinations.
-
-
Concentration Thresholds as Decision Points: Biological systems use concentration gradients and threshold mechanisms to make decisions. For example, the lac operon's response to lactose depends on the concentration of allolactose exceeding a critical threshold. These thresholds create binary-like decision points in otherwise continuous systems, enabling discrete logic-like behavior from analog chemical processes.
-
-
Feedback Loops as Iterative Processing: Biochemical feedback mechanisms implement iterative computational processes. Positive feedback creates amplification cascades (similar to computational scaling), while negative feedback provides stability and regulation. These loops can create oscillatory behavior, bistable switches, and other complex dynamics that resemble computational algorithms for pattern generation and control.
-
-
Signal Amplification as Computational Scaling: Biological systems use cascading reactions to amplify weak signals, similar to how computational systems use amplifiers and buffers. The phosphorylation cascade in signal transduction pathways, for example, can amplify a single extracellular signal into thousands of intracellular responses, demonstrating how biological systems achieve computational scaling through chemical mechanisms.
-
-
Stochastic Processes as Probabilistic Computation: Unlike deterministic digital computation, biological reactions are inherently stochastic. This probabilistic nature creates computational properties not found in conventional computing, including noise tolerance, adaptive responses, and emergent behaviors that arise from the statistical properties of molecular interactions.
Perhaps the most profound difference between genomic and conventional computation lies in the scale and nature of parallelism involved.
-
-
5.1 Unprecedented Scale of Parallel Processing
-
As Robbins calculated in 1995, "The expression of the human genome involves the simultaneous expression and (potential) interaction of something probably in excess of 10^18 parallel processes." This number derives from approximately 10^13 cells in the human body, each running 10^5-10^6 processes in parallel, with potential interactions between any processes in any cells.
-
-
This scale of parallelism is fundamentally different from any human-engineered computing system. To put this in perspective, the world's most powerful supercomputers operate with approximately 10^6-10^7 processing cores, while the human body operates with 10^18 parallel processes. This represents a difference of 11-12 orders of magnitude, making biological computation the most massively parallel system known to exist.
-
-
The implications of this scale are profound. Each cell in the human body is simultaneously executing thousands of biochemical reactions, processing environmental signals, maintaining homeostasis, and coordinating with neighboring cells. These processes are not merely concurrent but truly parallel, with each reaction occurring independently and simultaneously. The coordination between these processes emerges from the physical and chemical properties of the system rather than from centralized control mechanisms.
-
-
This massive parallelism enables biological systems to achieve computational capabilities that are impossible with sequential or even moderately parallel systems. For example, the immune system can simultaneously monitor for thousands of different pathogens, the nervous system can process multiple sensory inputs in real-time, and the metabolic system can maintain homeostasis across multiple organ systems simultaneously. These capabilities arise not from sophisticated algorithms but from the sheer scale of parallel processing available in biological systems.
-
-
5.2 True Parallelism vs. Time-Sharing
-
Unlike computer "parallel processing" that often involves time-sharing a smaller number of processors, genomic parallelism involves true simultaneous execution: "each single cell has millions of programs executing in a truly parallel (i.e., independent execution, no time sharing) mode."
-
-
This distinction between true parallelism and time-sharing is crucial for understanding biological computation. In conventional computing, "parallel" systems typically use time-sharing, where a limited number of processors rapidly switch between different tasks, creating the illusion of simultaneous execution. Even modern multi-core processors use sophisticated scheduling algorithms to manage task allocation and context switching.
-
-
In contrast, biological systems achieve true parallelism through physical separation and chemical independence. Each molecule in a cell can react independently and simultaneously with other molecules, without requiring any scheduling or coordination mechanism. This independence arises from the fundamental properties of chemical reactions—each reaction occurs based on local conditions and molecular interactions, not on system-wide scheduling decisions.
-
-
This true parallelism has profound implications for system design and behavior. In time-shared systems, bottlenecks can occur when multiple processes compete for limited resources. In biological systems, such bottlenecks are rare because each process operates independently with its own local resources. This independence also means that biological systems are inherently fault-tolerant—the failure of one process does not necessarily affect others, and the system can continue operating even with significant component failures.
-
-
The absence of centralized control in biological systems is both a strength and a challenge. On one hand, it eliminates single points of failure and enables robust, adaptive behavior. On the other hand, it makes biological systems difficult to understand and predict, as their behavior emerges from the collective interactions of countless independent processes rather than from explicit algorithms or control structures.
-
-
5.3 The Developmental Bootloader
-
Development begins with a specialized "bootloader" sequence that activates the zygotic genome after fertilization. This process transitions from maternal to zygotic control, initiates cascades of gene expression in precise sequence, establishes the initial conditions for all subsequent development, and creates a developmental trajectory with remarkable robustness.
-
-
The zygotic genome activation (ZGA) represents one of the most critical computational events in development. During early development, the embryo relies on maternal RNA and proteins deposited in the egg, but at a specific developmental stage, the zygotic genome "boots up" and begins transcribing its own genes. This transition is analogous to a computer bootloader that initializes the operating system, establishing the basic computational environment for all subsequent operations.
-
-
The bootloader process involves several computational elements that mirror those found in computer systems. First, there is a precise timing mechanism that determines when ZGA occurs—this timing is critical and must be coordinated with other developmental events. Second, there is a hierarchical activation sequence, where certain genes (often called "pioneer" genes) must be activated first to establish the conditions for subsequent gene expression. Third, there are feedback mechanisms that ensure the bootloader process is robust and can recover from errors or perturbations.
-
-
This bootloader analogy extends beyond the initial activation. Throughout development, there are multiple "reboot" events where cells transition between different developmental states. For example, during cellular differentiation, cells undergo transcriptional reprogramming that resembles a system reboot, where the cell's computational state is reset and a new program begins executing. These transitions are often triggered by specific signals or environmental conditions, similar to how computer systems can be configured to boot different operating systems based on user input or system state.
-
-
The robustness of the developmental bootloader is remarkable. Despite variations in environmental conditions, genetic background, and random molecular noise, development proceeds with remarkable consistency. This robustness suggests that the bootloader process has evolved sophisticated error-checking and recovery mechanisms, similar to those found in reliable computer systems. The ability to maintain developmental integrity despite perturbations is essential for the survival and reproduction of organisms, making the bootloader one of the most critical computational systems in biology.
-
-
5.4 Emergent Properties from Massive Parallelism
-
This unprecedented parallelism enables emergent properties not found in sequential computing: robust error correction through redundant processes, self-organization without central control, pattern formation through reaction-diffusion dynamics, and adaptation to changing conditions without explicit programming.
-
-
Robust Error Correction Through Redundancy: Biological systems achieve remarkable reliability through massive redundancy rather than through precise error-free operation. Each cell contains multiple copies of critical genes, and many cellular processes have backup mechanisms that can compensate for failures. This redundancy is made possible by the massive parallelism of biological systems—if one process fails, others can take over without affecting overall system function. This approach to error correction is fundamentally different from conventional computing, where reliability is typically achieved through precise design and error detection rather than through redundancy.
-
-
Self-Organization Without Central Control: The massive parallelism of biological systems enables self-organization, where complex patterns and behaviors emerge from the collective interactions of many simple components. This self-organization occurs without any central controller or coordinator—each component follows simple local rules, and the overall system behavior emerges from their collective interactions. Examples include the formation of cellular patterns during development, the synchronization of circadian rhythms across multiple cells, and the coordination of immune responses across the body. This emergent behavior is a direct consequence of the massive parallelism and local interactions that characterize biological systems.
-
-
Pattern Formation Through Reaction-Diffusion Dynamics: The parallel nature of biological systems enables complex pattern formation through reaction-diffusion mechanisms. These patterns emerge from the interaction between chemical reactions (which create and destroy molecules) and diffusion (which spreads molecules through space). The classic example is Alan Turing's model of animal coat patterns, where simple chemical reactions occurring in parallel across a developing embryo create complex spatial patterns. These patterns emerge spontaneously from the parallel execution of simple chemical rules, demonstrating how massive parallelism can create complex, organized structures without explicit programming.
-
-
Adaptation Without Explicit Programming: Biological systems can adapt to changing conditions without any explicit programming for those conditions. This adaptation occurs through the parallel operation of many different processes, each responding to local conditions. When environmental conditions change, some processes may be enhanced while others are suppressed, leading to an overall adaptation of the system. This adaptive behavior emerges from the collective response of many parallel processes rather than from explicit algorithms for adaptation. The ability to adapt to novel conditions without explicit programming is one of the most remarkable properties of biological systems and is a direct consequence of their massive parallelism.
-
-
Collective Intelligence Through Distributed Processing: The massive parallelism of biological systems enables forms of collective intelligence that are impossible in sequential systems. For example, the immune system can simultaneously monitor for thousands of different pathogens, learn from encounters with new pathogens, and mount appropriate responses. This collective intelligence emerges from the parallel operation of many different cell types, each contributing specialized knowledge and capabilities to the overall system. The intelligence of the system as a whole exceeds the capabilities of any individual component, demonstrating how massive parallelism can create emergent computational capabilities.
-
-
6. The Cell as a Virtual Machine
-
One of Robbins' most profound insights was that genomic programs execute on virtual machines defined by other genomic programs.
-
-
6.1 Self-Defining Execution Environment
-
"Genome programs execute on a virtual machine that is defined by some of the genomic programs that are executing. Thus, in trying to understand the genome, we are trying to reverse engineer binaries for an unknown CPU, in fact for a virtual CPU whose properties are encoded in the binaries we are trying to reverse engineer."
-
-
This insight reveals one of the most profound challenges in understanding biological computation. Unlike conventional computing, where the hardware (CPU, memory, etc.) is designed independently of the software that runs on it, in biological systems the "hardware" and "software" are co-evolved and mutually dependent. The cellular machinery that interprets the genome (the virtual machine) is itself encoded in the genome, creating a circular dependency that makes biological systems fundamentally different from engineered computing systems.
-
-
This self-defining nature has several important implications. First, it means that biological systems are inherently self-modifying—the programs can change the machine that executes them. This capability enables biological systems to adapt and evolve in ways that are impossible for conventional computers. For example, during development, cells can change their transcriptional machinery, modify their chromatin structure, and alter their metabolic networks, effectively reprogramming the virtual machine on which they run.
-
-
Second, this self-defining nature creates a fundamental challenge for reverse engineering. In conventional computing, we can understand a program by understanding the hardware it runs on. In biological systems, we must simultaneously understand both the program (the genome) and the machine (the cellular machinery), even though each depends on the other. This circular dependency makes biological systems much more difficult to understand and model than conventional computing systems.
-
-
Third, this self-defining nature enables biological systems to achieve levels of integration and optimization that are impossible in conventional computing. Because the hardware and software co-evolved, they are perfectly matched to each other, enabling biological systems to achieve remarkable efficiency and robustness. This integration also means that biological systems can adapt to new challenges by modifying both their programs and their execution environment simultaneously.
-
-
6.2 Probabilistic Op Codes
-
Unlike the deterministic operations of conventional computers, "genomic op codes are probabilistic, rather than deterministic. That is, when control hits a particular op code, there is a certain probability that a certain action will occur."
-
-
Think of it like rolling dice instead of flipping a light switch. Every biochemical reaction, every gene expression event, and every cellular process has an inherent element of randomness. This randomness is not a defect but a fundamental feature that enables unique capabilities.
-
-
The probabilistic nature arises from molecular chaos—molecules bouncing around randomly, transcription factors binding and unbinding, and constantly changing cellular conditions. This creates uncertainty about when and how biological operations will occur.
-
-
This probabilistic nature has profound implications. Biological systems must be robust to noise and uncertainty, and they can exploit randomness to achieve behaviors impossible in deterministic systems. For example, probabilistic gene expression enables cells to explore different states and adapt to changing conditions.
-
-
However, this also creates challenges for prediction. Unlike computers where the same inputs always produce the same outputs, biological systems can produce different outcomes even under identical conditions. This makes them harder to model but also more robust and adaptable.
-
-
6.3 The Genome as an AI Agent
-
This self-modifying, probabilistic system bears more resemblance to modern AI architectures than to conventional computing: Like neural networks, it operates with weighted probabilities; like reinforcement learning systems, it optimizes toward outcomes; like agent-based systems, it balances multiple objectives; unlike current AI, it developed through natural selection rather than design.
-
-
Neural Network Parallels: Biological systems operate through networks of interacting components that process information in parallel, similar to artificial neural networks. In both cases, the behavior of the system emerges from the collective activity of many simple processing units. However, biological networks are more sophisticated than artificial neural networks in several ways. They can modify their own structure and connectivity, they operate with multiple types of signals (chemical, electrical, mechanical), and they can change their computational properties based on context and experience.
-
-
Reinforcement Learning Analogies: Biological systems learn through trial and error, optimizing their behavior based on feedback from the environment. This learning process resembles reinforcement learning, where an agent learns to maximize rewards by exploring different actions and observing their consequences. However, biological reinforcement learning is more sophisticated than artificial versions, as it can modify not only its behavior but also its own learning mechanisms and objectives. This meta-learning capability enables biological systems to adapt their learning strategies to different environments and challenges.
-
-
Multi-Objective Optimization: Biological systems must balance multiple competing objectives simultaneously, such as growth, reproduction, survival, and energy efficiency. This multi-objective optimization is similar to the challenges faced by AI agents in complex environments. However, biological systems have evolved sophisticated mechanisms for balancing these objectives, including hierarchical control systems, priority-based decision making, and adaptive trade-offs that change based on environmental conditions.
-
-
Emergent Intelligence: The intelligence of biological systems emerges from the collective behavior of many simple components, rather than from a centralized control system. This emergent intelligence is similar to the behavior of swarm intelligence systems and multi-agent AI systems. However, biological systems achieve levels of coordination and cooperation that far exceed current artificial multi-agent systems, demonstrating how evolution can discover sophisticated solutions to complex coordination problems.
-
-
Adaptive Architecture: Unlike artificial AI systems, which have fixed architectures designed by humans, biological systems can modify their own computational architecture in response to experience and environmental conditions. This adaptive architecture enables biological systems to optimize their computational capabilities for specific tasks and environments, creating specialized processing systems that are perfectly suited to their particular challenges.
-
-
7. Case Studies in Genomic Programming
-
Different organisms demonstrate different "programming paradigms" at the genomic level:
-
-
7.1 Viruses: Minimal Programs
-
Program: Infect → Reproduce → Die
- Trigger: Contact with host cell
- Computational simplicity: Limited conditionals, linear execution
- Optimization: Maximum efficiency in minimal code
-
-
Viruses represent the most minimal form of biological computation, with genomes that are optimized for maximum efficiency in minimal code. The viral "program" is essentially a bootloader that hijacks the host cell's computational machinery to reproduce itself. This minimalism makes viruses excellent models for understanding the fundamental principles of biological computation, as they demonstrate how complex behaviors can emerge from simple, linear programs.
-
-
The viral life cycle follows a simple linear sequence: attachment to a host cell, entry into the cell, replication of viral components, assembly of new virus particles, and release from the cell. This linear execution is similar to a simple computer program with minimal branching and no complex control structures. However, even this simple program must handle multiple contingencies, such as different types of host cells, varying environmental conditions, and host immune responses.
-
-
The computational efficiency of viruses is remarkable. Some viruses can encode their entire program in fewer than 10,000 nucleotides, yet they can successfully infect, replicate, and spread through host populations. This efficiency is achieved through several strategies: overlapping genes that encode multiple proteins, regulatory sequences that serve multiple functions, and the exploitation of host cell machinery for most computational tasks. This minimalism demonstrates how biological systems can achieve complex outcomes through the efficient use of limited computational resources.
-
-
However, this minimalism also creates vulnerabilities. Viruses have limited ability to adapt to changing conditions, and they are highly dependent on their host cells for most computational functions. This dependence makes viruses excellent models for understanding the trade-offs between computational efficiency and robustness, as well as the relationship between program complexity and adaptability.
-
-
7.2 Unicellular Organisms: Autonomous Agents
-
Program: Eat → Grow → Divide
- Loop structure: WHILE food_present DO grow
- Event triggers: Mitosis on threshold conditions
- State-based logic: Different metabolic states based on environmental conditions
-
-
Unicellular organisms represent a more sophisticated form of biological computation, with programs that must balance multiple objectives while operating autonomously in complex environments. Unlike viruses, which are essentially parasites that hijack host machinery, unicellular organisms must implement their own computational infrastructure while also performing the basic functions of life: metabolism, growth, reproduction, and response to environmental changes.
-
-
The computational architecture of unicellular organisms is based on state machines that can transition between different metabolic states based on environmental conditions. For example, bacteria can switch between aerobic and anaerobic metabolism, between different carbon sources, and between growth and survival modes. These state transitions are triggered by environmental signals and are implemented through complex regulatory networks that integrate multiple inputs to make decisions about cellular behavior.
-
-
The cell cycle represents a fundamental computational loop that drives cellular behavior. This loop includes phases for growth, DNA replication, and cell division, with checkpoints that ensure each phase is completed correctly before proceeding to the next. These checkpoints implement error detection and correction mechanisms that are essential for maintaining genomic integrity. The cell cycle demonstrates how biological systems can implement complex control structures using simple molecular mechanisms.
-
-
Unicellular organisms also demonstrate sophisticated signal processing capabilities. They can detect and respond to multiple environmental signals simultaneously, integrating information about nutrient availability, temperature, pH, and the presence of other organisms. This signal integration enables cells to make complex decisions about their behavior, such as whether to grow, divide, form spores, or enter a dormant state. These decision-making processes resemble the control systems used in autonomous robots and other artificial agents.
-
-
The computational capabilities of unicellular organisms are particularly impressive given their simplicity. A single bacterial cell can implement complex behaviors such as chemotaxis (movement toward or away from chemicals), quorum sensing (communication with other cells), and biofilm formation (cooperative behavior with other cells). These capabilities demonstrate how biological systems can achieve sophisticated computational outcomes through the coordinated action of simple molecular components.
-
-
7.3 Multicellular Organisms: Distributed Systems
-
Subroutines: Cellular differentiation, immune responses
- Conditional branches: Hormone levels, cell signaling
- Coordinated processes: Development, aging, reproduction
- Distributed computation: Different cells executing different aspects of the overall program
-
-
Multicellular organisms represent the most complex form of biological computation, with programs that must coordinate the behavior of thousands to trillions of cells while maintaining the integrity and functionality of the entire organism. This coordination requires sophisticated communication systems, hierarchical control structures, and distributed decision-making mechanisms that far exceed the complexity of any artificial distributed system.
-
-
The computational architecture of multicellular organisms is based on cellular differentiation, where different cells execute different programs while sharing the same genome. This differentiation is controlled by complex regulatory networks that integrate multiple signals to determine cellular fate. The process of differentiation resembles the creation of specialized subroutines in a computer program, where different components perform different functions while working together to achieve overall system goals.
-
-
Communication between cells is essential for coordinating the behavior of multicellular organisms. This communication occurs through multiple mechanisms, including direct cell-to-cell contact, secreted signaling molecules, and electrical signals in the nervous system. These communication systems enable cells to share information about their state, coordinate their activities, and respond collectively to environmental changes. The complexity of these communication networks rivals that of modern computer networks, with multiple protocols, routing mechanisms, and error correction systems.
-
-
The immune system represents one of the most sophisticated computational systems in multicellular organisms. It must simultaneously monitor for thousands of different pathogens, learn from encounters with new pathogens, and mount appropriate responses while avoiding attacks on the organism's own cells. This system operates through distributed algorithms that involve multiple cell types, each contributing specialized knowledge and capabilities to the overall immune response. The immune system demonstrates how biological systems can achieve collective intelligence through the coordinated action of many simple components.
-
-
Development represents another remarkable computational achievement of multicellular organisms. Starting from a single cell, development creates complex three-dimensional structures with precise spatial organization and functional specialization. This process involves the coordinated action of thousands of genes across millions of cells, with precise timing and spatial control. The computational complexity of development is staggering, involving the simultaneous execution of thousands of parallel processes with complex interdependencies and feedback loops.
-
-
The computational capabilities of multicellular organisms are particularly impressive given the challenges they face. They must maintain homeostasis across multiple organ systems, respond to changing environmental conditions, and coordinate complex behaviors such as movement, feeding, and reproduction. These capabilities demonstrate how biological systems can achieve sophisticated computational outcomes through the coordinated action of many simple components, creating emergent properties that exceed the capabilities of any individual component.
-
-
-
-
8. The β-Galactosidase Revolution: From 1995 to 2025
-
The evolution from the author's original 1995 β-galactosidase flowchart to today's sophisticated Mermaid-based visualizations represents not just a technological advancement, but a fundamental transformation in how we create and share biological knowledge. This transformation exemplifies the democratization of computational biology through the convergence of human insight, AI assistance, and modern visualization tools.
-
-
8.1 The 1995 Journey: A Month of Manual Discovery
-
In 1995, creating the original β-galactosidase flowchart (Figure 3) was an arduous, month-long process that required:
-
-
Extensive Literature Review: Reading biology textbooks to understand the lac operon mechanism
-
Community Collaboration: Engaging in discussions with experts on the bionet.genome.chromosome newsgroup
-
Manual Design Tools: Using Inspiration, a web design tool, to painstakingly create the flowchart
-
Iterative Refinement: Multiple revisions based on peer feedback and deeper understanding
-
-
-
This process, while thorough, was limited by the tools available and the manual nature of knowledge synthesis. The author, drawing on an education in mathematics and philosophy at Bedford College, London in the 1970s, and working as a web developer and journalist in the 1990s, spent countless hours transforming biological concepts into computational visualizations for a monthly column in The X Advisor, a computer industry trade publication.
-
-
8.2 The 2025 Revolution: AI-Powered Biological Modeling
-
Today, the same process that took a month in 1995 can be accomplished in hours or days, thanks to the revolutionary combination of:
-
-
Mermaid Markdown (.mmd) Format: Invented by Knut Sveidqvist and released on GitHub in 2014, Mermaid provides a text-based syntax for creating complex diagrams
-
Large Language Models (LLMs): AI systems that can extract and synthesize biological knowledge from vast literature
-
Human-AI Collaboration: The unique combination of human biological insight and AI processing power
-
Rapid Iteration: The ability to quickly refine and improve visualizations based on feedback
-
-
-
8.3 A Comparative Analysis: 1995 vs 2025
-
-
-
-
Figure 3: β-Galactosidase Regulation Flowchart (1995) - The Original
-
- The author's original 1995 computational flowchart created with Inspiration after a month of research, reading, and community discussion. This groundbreaking visualization was among the first to model genetic regulation using computational logic constructs, establishing the foundation for computational biology visualization.
-
-
-
-
2025 Mermaid-Based β-Galactosidase Analysis - Using modern tools and AI assistance, we can now create far more sophisticated and detailed visualizations:
-
-
-
-
-
-graph TD
- %% Environmental Inputs (Red)
- A[Lactose in Environment] --> B[Lactose Transport]
- C[Glucose in Environment] --> D[Glucose Transport]
- E[Low Energy Status] --> F[Energy Stress Signal]
-
- %% Structures & Objects (Yellow)
- G[Lactose Permease LacY] --> H[Lactose Inside Cell]
- I[Glucose Transporters] --> J[Glucose Inside Cell]
-
- %% Decision Logic
- H --> K{Is Lactose Present?}
- J --> L{Is Glucose Present?}
- F --> M{Is Energy Low?}
-
- %% Regulatory States (Blue)
- K -->|No| N[Lac Repressor Active]
- K -->|Yes| O[Lac Repressor Inactive]
- L -->|Yes| P[High Glucose Status]
- L -->|No| Q[Low cAMP Levels]
- M -->|Yes| R[High cAMP Levels]
- M -->|No| S[Low cAMP Levels]
-
- %% Regulatory Actions
- N --> T[Repressor Binds Operator]
- O --> U[Repressor Released]
- T --> V[Repressor Transcription Blocked]
- U --> W[Operator Free]
-
- %% CAP Regulation
- Q --> X[cAMP-CAP Complex]
- R --> X
- X --> Y{CAP Bound?}
- W --> Z{Operator Free?}
-
- %% Transcription Control
- Y -->|Yes| AA[CAP Binds Promoter]
- Y -->|No| BB[No CAP Binding State]
- Z -->|Yes| CC[RNA Polymerase Binding]
- Z -->|No| DD[Operator Transcription Blocked]
-
- %% Transcription Levels
- AA --> EE[Strong Transcription]
- BB --> FF[Weak Transcription]
- CC --> EE
- DD --> GG[Transcription Blocked]
-
- %% mRNA Synthesis
- EE --> HH[lacZ mRNA Synthesis]
- EE --> II[lacY mRNA Synthesis]
- EE --> JJ[lacA mRNA Synthesis]
-
- %% Protein Translation
- HH --> KK[LacZ Translation]
- II --> LL[LacY Translation]
- JJ --> MM[LacA Translation]
-
- %% Enzymes (Yellow)
- KK --> NN[Beta-Galactosidase Enzyme]
- LL --> OO[Lactose Permease]
- MM --> PP[Galactoside Acetyltransferase]
-
- %% Chemical Processing (Green)
- NN --> QQ[Lactose Hydrolysis]
- OO --> RR[Lactose Transport]
- PP --> SS[Galactoside Modification]
-
- %% Products (Violet)
- QQ --> TT[Glucose + Galactose]
- RR --> UU[Lactose Uptake]
- SS --> VV[Detoxification]
-
- %% Metabolic Integration
- TT --> WW[Glycolysis]
- UU --> XX[Lactose Processing]
- VV --> YY[Cell Protection]
-
- %% System Outputs
- WW --> ZZ[Energy Production]
- XX --> AAA[Lactose Consumption]
- YY --> BBB[Cell Survival]
-
- %% Feedback Loops
- ZZ --> CCC[Energy Status Improved]
- AAA --> DDD[Lactose Depletion]
- BBB --> EEE[Reduced Energy Stress]
-
- %% System Equilibrium
- CCC --> FFF[Reduced Lactose Signal]
- DDD --> FFF
- EEE --> GGG[Maintained Homeostasis]
- FFF --> GGG
- GGG --> HHH[System Equilibrium]
-
- %% Color Key Legend
- LEGEND1[🔴 Triggers & Conditions]
- LEGEND2[🟡 Catalysts & Enzymes]
- LEGEND3[🟢 Chemical Processing]
- LEGEND4[🔵 Intermediates & States]
- LEGEND5[🟣 Products & Outputs]
-
- %% Legend Connections
- LEGEND1 -.-> LEGEND2
- LEGEND2 -.-> LEGEND3
- LEGEND3 -.-> LEGEND4
- LEGEND4 -.-> LEGEND5
-
- %% Styling - Programming Framework Color Scheme
- %% Red (#ff6b6b): Triggers & Inputs
- style A fill:#ff6b6b,color:#fff
- style C fill:#ff6b6b,color:#fff
- style E fill:#ff6b6b,color:#fff
-
- %% Yellow (#ffd43b): Structures & Objects
- style G fill:#ffd43b,color:#000
- style I fill:#ffd43b,color:#000
- style NN fill:#ffd43b,color:#000
- style OO fill:#ffd43b,color:#000
- style PP fill:#ffd43b,color:#000
-
- %% Green (#51cf66): Processing & Operations
- style B fill:#51cf66,color:#fff
- style D fill:#51cf66,color:#fff
- style F fill:#51cf66,color:#fff
- style T fill:#51cf66,color:#fff
- style U fill:#51cf66,color:#fff
- style AA fill:#51cf66,color:#fff
- style CC fill:#51cf66,color:#fff
- style HH fill:#51cf66,color:#fff
- style II fill:#51cf66,color:#fff
- style JJ fill:#51cf66,color:#fff
- style KK fill:#51cf66,color:#fff
- style LL fill:#51cf66,color:#fff
- style MM fill:#51cf66,color:#fff
- style QQ fill:#51cf66,color:#fff
- style RR fill:#51cf66,color:#fff
- style SS fill:#51cf66,color:#fff
- style WW fill:#51cf66,color:#fff
- style XX fill:#51cf66,color:#fff
- style YY fill:#51cf66,color:#fff
- style CCC fill:#51cf66,color:#fff
- style DDD fill:#51cf66,color:#fff
- style EEE fill:#51cf66,color:#fff
-
- %% Blue (#74c0fc): Intermediates & States
- style H fill:#74c0fc,color:#fff
- style J fill:#74c0fc,color:#fff
- style K fill:#74c0fc,color:#fff
- style L fill:#74c0fc,color:#fff
- style M fill:#74c0fc,color:#fff
- style N fill:#74c0fc,color:#fff
- style O fill:#74c0fc,color:#fff
- style P fill:#74c0fc,color:#fff
- style Q fill:#74c0fc,color:#fff
- style R fill:#74c0fc,color:#fff
- style S fill:#74c0fc,color:#fff
- style V fill:#74c0fc,color:#fff
- style W fill:#74c0fc,color:#fff
- style X fill:#74c0fc,color:#fff
- style Y fill:#74c0fc,color:#fff
- style Z fill:#74c0fc,color:#fff
- style BB fill:#74c0fc,color:#fff
- style DD fill:#74c0fc,color:#fff
- style EE fill:#74c0fc,color:#fff
- style FF fill:#74c0fc,color:#fff
- style GG fill:#74c0fc,color:#fff
- style FFF fill:#74c0fc,color:#fff
- style GGG fill:#74c0fc,color:#fff
- style HHH fill:#74c0fc,color:#fff
-
- %% Violet (#b197fc): Products & Outputs
- style TT fill:#b197fc,color:#fff
- style UU fill:#b197fc,color:#fff
- style VV fill:#b197fc,color:#fff
- style ZZ fill:#b197fc,color:#fff
- style AAA fill:#b197fc,color:#fff
- style BBB fill:#b197fc,color:#fff
-
- %% Legend Styling
- style LEGEND1 fill:#ff6b6b,color:#fff
- style LEGEND2 fill:#ffd43b,color:#000
- style LEGEND3 fill:#51cf66,color:#fff
- style LEGEND4 fill:#74c0fc,color:#fff
- style LEGEND5 fill:#b197fc,color:#fff
-
- A modern computational analysis of the lac operon using Mermaid syntax and Programming Framework methodology. This visualization demonstrates how AI assistance and modern tools enable the creation of sophisticated biological flowcharts with detailed computational logic, color-coded analysis, and comprehensive pathway representation—all achievable in hours rather than months. The chart shows environmental inputs, regulatory complexes and enzymes, intermediate states and logic gates, functional outputs, and key regulatory proteins, revealing the sophisticated computational logic underlying lactose metabolism in E. coli including CAP-cAMP regulation, protein synthesis, and dynamic feedback control.
-
-
-
-
8.4 The Transformation: From Amateur Science to AI-Enabled Innovation
-
This comparison reveals a profound transformation in scientific practice:
-
-
1995 Characteristics:
-
-
Manual research and synthesis
-
Limited by available tools and time
-
Individual effort over extended periods
-
Simple flowchart representation
-
Linear, sequential visualization
-
-
-
2025 Capabilities:
-
-
AI-assisted knowledge extraction and synthesis
-
Rapid iteration and refinement
-
Complex, multi-layered visualizations
-
Programming Framework color coding
-
Scalable to hundreds of biological processes
-
-
-
The Remarkable Achievement: What once required a month of dedicated work by a trained biologist can now be accomplished in days, with far greater detail and sophistication. Yet this transformation was only possible through the convergence of human biological understanding (rooted in solid educational foundations), innovative visualization tools (Mermaid), and AI assistance (LLMs).
-
-
8.5 The Democratization of Computational Biology
-
This evolution represents more than just technological progress—it represents the democratization of computational biology. In 1995, creating biological flowcharts required specialized knowledge, significant time investment, and access to academic communities. Today, the combination of educational background, AI assistance, and modern tools enables rapid creation of sophisticated biological visualizations.
-
-
The author's journey from manually creating single flowcharts to generating hundreds of detailed biological process diagrams exemplifies how AI can amplify human expertise rather than replace it. The mathematical and philosophical training from Bedford College, combined with decades of experience in journalism and web development, provided the analytical framework necessary to guide AI systems in creating meaningful visualizations. Now at 72 and retired, the author continues the amateur science tradition with vastly improved tools.
-
-
Rarely Used for Biological Applications: While Mermaid has been implemented in numerous documentation platforms since its 2014 release, its application to biological process modeling—particularly the systematic extraction of .mmd files from scientific literature by humans and AI working together—represents a novel and innovative use case. This approach transforms static biological knowledge into dynamic, visual computational models.
-
-
8.6 The Innovation: Genuine Contribution to Biology
-
This work represents a genuine innovation in biological visualization and computational thinking. By systematically applying the Programming Framework methodology to biological processes, we have created:
-
-
A New Visualization Paradigm: Treating biological processes as computational programs with logic gates, decision points, and feedback loops
-
Scalable Methodology: Demonstrating how hundreds of biological processes can be systematically analyzed and visualized
-
Cross-Kingdom Analysis: Comparing computational architectures across different organisms (yeast vs bacteria)
-
Educational Innovation: Making complex biological concepts accessible through computational metaphors
-
-
-
This innovation bridges the gap between computational thinking and biological understanding, creating new possibilities for research, education, and synthetic biology applications. The transformation from 1995 to 2025 demonstrates how the combination of solid educational foundations, innovative thinking, and modern AI tools can enable individual researchers to make significant contributions to scientific understanding.
-
-
9. Visualization Challenges and the Limits of Linear Representation
-
The exchange between Welz and Robison in 1995 highlighted a fundamental challenge that persists today: how to visually represent massively parallel processes using tools designed for sequential thinking. The author's β-galactosidase flowchart exemplified both the promise and the problems of this approach.
-
-
9.1 Limitations of Linear Flowcharts
-
As Robison noted: "Flowcharts are inherently linear beasts, ill-suited for parallel processes, especially biological ones with many non-linearly combined inputs." Traditional flowcharts suggest a sequence of operations that misrepresents the simultaneous nature of genomic processes.
-
-
9.2 Alternative Visualization Approaches
-
Contemporary approaches to representing genomic computation have attempted to address these limitations through network diagrams showing interaction rather than sequence, heat maps representing multiple states simultaneously, multi-dimensional representations capturing regulatory relationships, and dynamic simulations rather than static diagrams. However, even these advanced visualization systems struggle with the fundamental challenge identified in 1995: representing true parallelism in comprehensible visual formats.
-
-
-
-
-
-
Figure 8: Gene Expression Networks (2024)
-
del Val et al.'s gene expression networks demonstrate how modern computational biology addresses the parallelism challenge identified in 1995. This multi-omic network analysis shows how genes interact in complex regulatory networks, revealing the systems-level logic that governs biological processes. Unlike linear flowcharts, this network visualization captures the parallel, interconnected nature of genomic computation, representing the future of computational biology where understanding biological systems requires analysis of their computational properties and network dynamics. Source: del Val et al. (2024).
-
-
-
9.3 The Enduring Relevance of Early Insights
-
The visualization challenges raised by Robison's critique of the β-galactosidase flowchart continue to influence how we think about representing biological systems. Modern synthetic biology, systems biology, and computational biology all grapple with the same fundamental tension between the need for clear, understandable representations and the reality of massively parallel, probabilistic biological processes.
-
-
10. Limitations, Criticisms, and Alternative Perspectives
-
While the genome-as-program metaphor provides valuable insights, it is important to acknowledge its limitations and consider alternative perspectives. Several criticisms and challenges have been raised regarding this approach.
-
-
10.1 The Program-Programmer Paradox
-
A fundamental challenge to the metaphor is the absence of a programmer. Unlike human-written software:
-
-
10.1 Evolution as "Programmer"
-
The genome evolved through natural selection; there is no separate "specification" from "implementation"; the "debugging" process (evolution) occurs across generations; the line between program and programmer blurs as the genome modifies itself.
-
-
10.2 Integration of Hardware and Software
-
In conventional computing, hardware and software are distinct. In genomic systems: the genome is both the program and the machine that interprets itself; the distinction between "data" and "process" blurs; physical structure and information content are inseparable.
-
-
10.3 The Absence of Central Control
-
Unlike most computer programs: no central processing unit coordinates execution; no master clock synchronizes operations; no operating system manages resources; control emerges from distributed interactions.
-
-
10.4 Alternative Metaphors and Perspectives
-
Several alternative metaphors have been proposed for understanding biological systems:
-
-
Network Metaphor: Some researchers prefer to view biological systems as complex networks rather than programs, emphasizing the interconnected nature of biological components and the emergent properties that arise from network dynamics.
-
-
Ecosystem Metaphor: Others argue that biological systems are better understood as ecosystems, where multiple agents interact in complex ways, creating dynamic equilibria and co-evolutionary processes.
-
-
Information Processing Metaphor: An alternative approach focuses on information processing and communication rather than computation, emphasizing how biological systems encode, transmit, and process information.
-
-
These alternative perspectives highlight different aspects of biological complexity and may be more appropriate for certain types of analysis. The genome-as-program metaphor should be viewed as one useful framework among many, rather than a complete description of biological reality.
-
-
11. Synthetic Biology and AI Implications
-
The genome-as-program metaphor has profound implications for both synthetic biology and artificial intelligence.
-
-
11.1 Programming Living Systems
-
Viewing the genome as a program enables engineered cells to be written, debugged, and optimized. Synthetic biology gains logic tools to regulate traits, behaviors, and lifecycles. The β-galactosidase flowchart represents an early conceptual bridge toward this engineering approach, demonstrating how biological regulatory circuits can be understood and potentially redesigned using computational logic.
-
-
11.2 Learning from Nature's Computing
-
The genomic computational paradigm offers lessons for AI design: massive parallelism with simple components; probabilistic operations with emergent determinism; self-modifying code and execution environment; integration of digital and analog processing.
-
-
11.3 The Genome Logic Modeling Project (GLMP)
-
The Genome Logic Modeling Project (GLMP) aims to formalize the metaphor of the genome as a computer program. It models organisms as logic-executing agents, with internal subroutines and external triggers. GLMP frames biology as structured, conditional, recursive, and state-driven.
-
-
Goals and Objectives: The GLMP seeks to create a unified framework for understanding biological systems through computational logic, develop tools for modeling genetic circuits, and establish a collaborative platform for interdisciplinary research. The project aims to bridge the gap between theoretical computational biology and practical applications in synthetic biology and AI.
-
-
Expected Outcomes: The GLMP will produce computational models of genetic circuits, visualization tools for genomic logic, educational materials for teaching computational biology, and a community platform for researchers to share insights and collaborate on genomic modeling projects.
-
-
This article represents a foundational publication for this project, which will explore topics including: Life as a Running Logic Program; Bootloaders of Life: Zygotic Genome Activation; Subroutines in Biology: Modular Design; Shutdown Protocols: Senescence and Apoptosis; Synthetic Biology Through Logic Gates; Agent-Based Models of Organism Logic.
-
-
Concrete Examples of GLMP Research:
-
-
Yeast Cell Cycle Modeling: Developing computational models of the yeast cell cycle as a state machine with checkpoints and feedback loops
-
Bacterial Quorum Sensing: Modeling bacterial communication systems as distributed algorithms
-
Immune System Logic: Representing immune responses as pattern recognition and decision-making algorithms
-
Developmental Pathways: Creating flowcharts for embryonic development processes
-
-
-
11.3.1 GLMP as a Collaborative Research Platform
-
The GLMP is designed as an open, collaborative platform that invites researchers, computational biologists, AI specialists, and interested parties from all disciplines to participate in this endeavor. The project recognizes that understanding the genome as a computational system requires diverse perspectives and expertise, from molecular biologists who understand the biochemical details to computer scientists who can formalize computational models.
-
-
We encourage contributions in several key areas: (1) Specific Gene Circuit Analysis—detailed computational models of individual genetic circuits, similar to the β-galactosidase example but for other genes and processes; (2) Cross-Species Comparisons—how different organisms implement similar computational functions; (3) Computational Tool Development—software and visualization tools for representing genomic logic; and (4) Integration with Modern AI—connections between genomic computation and contemporary artificial intelligence systems.
-
-
11.3.2 Parallels with DeepMind's Cell Project
-
The recent announcement of DeepMind's Cell project, led by Demis Hassabis, represents a significant validation of the genome-as-program metaphor and demonstrates how this perspective is gaining traction in the AI community. Like the GLMP, DeepMind's Cell project aims to model cellular processes as computational systems, beginning with the yeast cell as a model organism.
-
-
This convergence of approaches is particularly significant because it shows that the computational perspective on biology is not merely a metaphor but a practical framework for understanding and modeling biological systems. The fact that one of the world's leading AI research organizations is pursuing this approach validates the fundamental insights that motivated the GLMP.
-
-
The GLMP can complement and extend DeepMind's work by providing a broader theoretical framework and encouraging community participation. While DeepMind focuses on building comprehensive cell models, the GLMP can serve as a platform for researchers to contribute specific computational analyses of genetic circuits, regulatory networks, and cellular processes. This collaborative approach can accelerate progress in both understanding biological computation and developing new computational paradigms.
-
-
11.3.3 Call to Action: Join the GLMP Community
-
We invite researchers and enthusiasts to contribute to the GLMP in several ways:
-
-
For Molecular Biologists: Share your knowledge of specific genetic circuits and regulatory mechanisms. Help us understand how your research area can be represented as computational logic. Contribute examples of gene regulation that could be modeled as flowcharts or logic circuits.
-
-
For Computer Scientists: Develop computational models of genetic processes. Create visualization tools for representing genomic logic. Design algorithms inspired by biological computation. Help formalize the computational languages needed to describe genomic processes.
-
-
For AI Researchers: Explore connections between genomic computation and artificial intelligence. Investigate how biological learning and adaptation mechanisms can inform AI design. Develop AI systems that can analyze and model genomic logic.
-
-
For Educators: Help develop educational materials that use computational metaphors to teach biology. Create interactive simulations of genetic processes. Bridge the gap between computer science and biology education.
-
-
For Enthusiasts: Participate in discussions, share ideas, and help build the GLMP community. Contribute to documentation, visualization, and communication efforts. Help make complex biological concepts accessible to broader audiences.
-
-
The GLMP represents an opportunity to fundamentally change how we understand and interact with biological systems. By treating the genome as a computational system, we can develop new tools for understanding life, new approaches to synthetic biology, and new paradigms for computing itself. The time is right for this perspective, as evidenced by the convergence of approaches from multiple research communities.
-
-
12. Future Research Directions
-
This metaphor opens several promising research avenues:
-
-
12.1 Formal Languages for Genomic Logic
-
Develop specialized notation for genomic computation; create simulation environments based on genomic logic; bridge between biological description and computational models. The insights from early flowcharts like Figure 1 suggest the need for new visual languages that can better represent parallel, probabilistic biological processes.
-
-
12.2 New Computational Architectures
-
Design computing systems inspired by genomic parallelism; explore probabilistic processing at massive scale; develop self-modifying execution environments. The scale of parallelism identified by Robbins—exceeding 10^18 processes—suggests computational architectures fundamentally different from current designs.
-
-
12.3 Educational Models
-
Teach genomic function using computational metaphors; develop interactive simulations of genomic processes; bridge disciplinary gaps between computer science and biology. The historical progression from simple flowcharts to modern network visualizations illustrates the ongoing challenge of making complex biological computation comprehensible.
-
-
12.4 Yeast Cell as a Model System for Computational Analysis
-
The choice of yeast (Saccharomyces cerevisiae) as a model organism for both DeepMind's Cell project and potential GLMP analyses is particularly apt. Yeast represents an ideal intermediate complexity system—more sophisticated than bacteria but simpler than multicellular organisms—making it perfect for developing computational models of cellular processes.
-
-
Yeast cells offer several advantages for computational analysis: (1) Well-characterized genome—extensive genetic and biochemical data available; (2) Modular processes—clear separation of cellular functions that can be modeled as computational modules; (3) Experimental tractability—easy to manipulate and observe; and (4) Evolutionary conservation—many processes conserved in higher organisms.
-
-
Specific yeast processes that could be modeled as computational systems include: (1) Cell cycle regulation—a complex state machine with checkpoints and feedback loops; (2) Metabolic networks—dynamic systems responding to nutrient availability; (3) Stress response pathways—adaptive systems that modify cellular behavior based on environmental conditions; and (4) Mating type switching—a sophisticated genetic program that controls cellular identity and behavior.
-
-
The GLMP community can contribute to this effort by developing computational models of specific yeast processes, creating visualization tools for yeast genetic circuits, and comparing yeast computational logic with that of other organisms. This work can serve as a foundation for understanding more complex cellular systems and provide valuable insights for both basic biology and synthetic biology applications.
-
-
13. Glossary of Key Terms
-
Associative Addressing: A memory system where data is found by content rather than location (like finding a book by its subject rather than shelf position).
-
-
Probabilistic Op Codes: Computational operations that have a probability of occurring rather than being deterministic (like rolling dice instead of flipping a light switch).
-
-
Massive Parallelism: The simultaneous execution of billions of processes, as opposed to sequential processing where operations happen one after another.
-
-
Virtual Machine: A computational environment that is defined by the programs it runs, creating a circular dependency between hardware and software.
-
-
Zygotic Genome Activation: The "bootloader" process where an embryo transitions from using maternal RNA to transcribing its own genes.
-
-
14. Conclusion
-
Summary of Key Findings:
-
-
The genome operates as a computational system with recognizable logic structures, from conditional operations to feedback loops
-
Biological computation operates at unprecedented scales of parallelism (10^18+ processes) with probabilistic rather than deterministic operations
-
The cell functions as a self-defining virtual machine, creating a circular dependency between hardware and software
-
Historical developments from Mendel's Punnett squares to modern synthetic biology demonstrate the evolution of computational thinking in biology
-
The genome-as-program metaphor provides valuable insights for synthetic biology and AI design, despite its limitations
-
-
-
The genome is not a static archive but a living program in execution—one that operates on computational principles fundamentally different from those of conventional computers. Each organism runs a massively parallel set of probabilistic processes driven by chemistry, inheritance, and context.
-
-
The β-galactosidase flowchart of 1995, while limited in its linear representation, marked an important step in recognizing the computational nature of genetic regulation. The critiques it received—particularly regarding the challenge of representing parallel processes—highlighted fundamental issues that continue to shape how we visualize and understand biological computation today.
-
-
As Robert Robbins presciently noted in 1995, "It would be really interesting to think about the computational properties that might emerge in a system with probabilistic op codes and with as much parallelism as biological computers." Nearly three decades later, this observation points toward a rich frontier of research at the intersection of computation and biology.
-
-
Implications and Future Directions: By understanding the genome as a unique computational paradigm, we gain insights not only into how life functions but also into new possibilities for computing itself. The Genome Logic Modeling Project (GLMP) provides a framework for advancing this understanding through collaborative research. The genome-as-program metaphor invites us to reimagine biology not only as a science of what life is, but how it computes. The tension between linear representations and parallel realities, first exemplified in early flowcharts, continues to drive innovation in both biological understanding and computational design.
-
-
-
References
-
-
Jacob, F. & Monod, J. (1961). Genetic regulatory mechanisms in the synthesis of proteins. Journal of Molecular Biology, 3, 318-356.
-
Robbins, R.J. (1995). Discussion on bionet.genome.chromosome newsgroup regarding genomic computation.
-
Dellaire, G. (1995). Response on bionet.genome.chromosome regarding genetic imprinting and genomic structure.
-
Welz, G. (1995). Is a genome like a computer program? The X Advisor.
-
Jonassen, I. Bioinformatics Links, University of Bergen.
-
Krzywinski, M., et al. (2009). Circos: An information aesthetic for comparative genomics. Genome Research, 19(9), 1639-1645.
-
Höhna, S., et al. (2014). Probabilistic graphical models in evolution and phylogenetics. Systematic Biology, 63(5), 753-771.
-
Koutrouli, M., et al. (2020). Guide to visualization of biological networks: Types, tools and strategies. Frontiers in Bioinformatics, 2, 1-21.
-
O'Donoghue, S.I., et al. (2018). Visualization of biomedical data. Annual Review of Biomedical Data Science, 1, 275-304.
-
Nardone, G.G., et al. (2023). Identifying missing pieces in color vision defects: a genome-wide association study in Silk Road populations. Frontiers in Genetics, 14:1161696.
-
del Val, C., et al. (2024). Gene expression networks regulated by human personality. Molecular Psychiatry, 29, 2241–2260.
This analysis is based on the comprehensive biological dataset from the Genome Logic Modeling Project (GLMP), which contains 50+ analyzed biological processes across multiple organisms and systems.
+
This analysis is based on the comprehensive biological dataset from the Genome Logic Modeling Project (GLMP), which contains 297+ analyzed biological processes across multiple organisms and systems.
GLMP Resources:
-
🗄️ GLMP Database Table - Interactive database with all biological processes (opens in new tab)
The GLMP represents the first successful application of the Programming Framework to biological processes, demonstrating how biological systems function as sophisticated computational programs with complex regulatory logic, decision trees, and feedback mechanisms.
+
The GLMP represents the most comprehensive biological computing system analysis ever created, demonstrating how biological systems function as sophisticated computational programs with complex regulatory logic, decision trees, and feedback mechanisms.
This document presents representative biological processes analyzed using the Programming Framework methodology. Each process is represented as a computational flowchart with standardized color coding: Red for triggers/inputs, Yellow for structures/objects, Green for processing/operations, Blue for intermediates/states, and Violet for products/outputs. Yellow nodes use black text for optimal readability, while all other colors use white text.
This document presents chemistry processes analyzed using the Programming Framework methodology. Each process is represented as a computational flowchart with standardized color coding: Red for triggers/inputs, Yellow for structures/objects, Green for processing/operations, Blue for intermediates/states, and Violet for products/outputs. Yellow nodes use black text for optimal readability, while all other colors use white text.
-
-
1. Catalytic Hydrogenation Process
-
-
-graph TD
- A[Alkene Substrate] --> B[Substrate Analysis]
- C[Hydrogen Gas H₂] --> D[Gas Supply]
- E[Catalyst Pt/Pd/Ni] --> F[Catalyst Preparation]
-
- B --> G[Substrate Purity Check]
- D --> H[Gas Pressure Control]
- F --> I[Catalyst Activation]
-
- G --> J[Reaction Vessel Loading]
- H --> K[Pressure Regulation]
- I --> L[Catalyst Dispersion]
-
- J --> M[Substrate Adsorption]
- K --> N[Hydrogen Dissociation]
- L --> O[Active Site Formation]
-
- M --> P[π-Bond Activation]
- N --> Q[H• Radical Formation]
- O --> R[Catalytic Surface]
-
- P --> S[First H Addition]
- Q --> T[Hydrogen Transfer]
- R --> U[Surface Complex]
-
- S --> V[Alkyl Intermediate]
- T --> W[Second H Addition]
- U --> X[Product Desorption]
-
- V --> Y[Alkane Product]
- W --> Z[Complete Hydrogenation]
- X --> AA[Catalyst Recovery]
-
- %% Red: Reactants & Inputs
- style A fill:#ff6b6b,color:#fff
- style C fill:#ff6b6b,color:#fff
- style E fill:#ff6b6b,color:#fff
-
- %% Yellow: Catalysts & Equipment
- style B fill:#ffd43b,color:#000
- style D fill:#ffd43b,color:#000
- style F fill:#ffd43b,color:#000
- style G fill:#ffd43b,color:#000
- style H fill:#ffd43b,color:#000
- style I fill:#ffd43b,color:#000
- style J fill:#ffd43b,color:#000
- style K fill:#ffd43b,color:#000
- style L fill:#ffd43b,color:#000
-
- %% Green: Chemical Reactions
- style M fill:#51cf66,color:#fff
- style N fill:#51cf66,color:#fff
- style O fill:#51cf66,color:#fff
- style P fill:#51cf66,color:#fff
- style Q fill:#51cf66,color:#fff
- style R fill:#51cf66,color:#fff
- style S fill:#51cf66,color:#fff
- style T fill:#51cf66,color:#fff
- style U fill:#51cf66,color:#fff
- style W fill:#51cf66,color:#fff
- style X fill:#51cf66,color:#fff
-
- %% Blue: Intermediates & States
- style V fill:#74c0fc,color:#fff
-
- %% Violet: Final Products
- style Y fill:#b197fc,color:#fff
- style Z fill:#b197fc,color:#fff
- style AA fill:#b197fc,color:#fff
-
-
-
-
- Reactants & Catalysts
-
-
- Reaction Vessels & Equipment
-
-
- Chemical Reactions & Transformations
-
-
- Intermediates
-
-
- Products
-
-
-
-
- Figure 1. Catalytic Hydrogenation Process. This chemistry process visualization demonstrates catalytic reaction mechanisms. The flowchart shows reactant inputs, reaction vessels and equipment, chemical reactions and transformations, intermediate species, and final products.
-
- Figure 4. Electrochemical Cell Process. This chemistry process visualization demonstrates electrochemical energy conversion. The flowchart shows electrode inputs, cell components, redox reactions, intermediate processes, and final electrical energy output.
-
A hybrid architecture for representing and visualizing axiomatic dependency structures across multiple mathematical subjects. Supports both static Mermaid subgraphs and interactive full-graph exploration.
+
+
Scope: Target Subjects
+
+
Subject
Foundations
Derived Items
Notes
+
Euclid's Elements
Postulates, Common Notions, Definitions
464 Propositions (13 books)
Geometric constructions
+
Peano Arithmetic
5 axioms, definitions
Theorems
Successor, induction
+
Other number systems
Axioms (integers, rationals, reals)
Theorems
Construction sequences
+
Number theory
Definitions, lemmas
Theorems
Divisibility, primes
+
Algebra
Group/ring/field axioms
Theorems
Abstract structures
+
Hilbert's geometry
5 groups of axioms
Theorems
Grundlagen der Geometrie
+
Tarski's geometry
Betweenness, congruence relations
Theorems
First-order, decidable
+
Analysis
Completeness, continuity axioms
Theorems
Real analysis, limits
+
+
+
Metadata & Sources
+
Each discourse includes metadata (created, lastUpdated, version, license, authors, methodology, citation) and sources (primary texts, digital editions, commentaries). Nodes can reference sources via sourceRef.
+
+
Hybrid Architecture
+
+
Canonical JSON — One file per discourse, source of truth
+
Mermaid generator — Filter by book/chapter, output subgraph
+
Interactive viewer — Full graph with zoom, search, highlight
+
Index/Registry — Catalog of all discourses with metadata
Categorical syllogistic from Prior Analytics. Four perfect syllogisms (Barbara, Celarent, Darii, Ferio), three conversion rules, and ten imperfect syllogisms reduced to the perfect ones. Split into three views.
Counting principles: sum and product rules, permutations, combinations, binomial theorem, pigeonhole principle, inclusion-exclusion, derangements. Split into three views.
+
+`;
+ fs.writeFileSync(path.join(GEO_DIR, "geometry_topology-combinatorics.html"), indexHtml, "utf8");
+ console.log("Wrote", path.join(GEO_DIR, "geometry_topology-combinatorics.html"));
+} else {
+ console.log("MATH_DB not found - skipping HTML generation.");
+}
+
+console.log("Done. Nodes:", discourse.nodes.length, "Edges:", discourse.edges.length);
diff --git a/generator/build-euclid-book-i.js b/generator/build-euclid-book-i.js
new file mode 100644
index 0000000000000000000000000000000000000000..66cd2bc180065e0aee43f7be0bb1e829233f3265
--- /dev/null
+++ b/generator/build-euclid-book-i.js
@@ -0,0 +1,441 @@
+#!/usr/bin/env node
+/**
+ * Build Euclid's Elements Book I discourse JSON and Mermaid.
+ * Dependencies from David E. Joyce, Clark University:
+ * https://mathcs.clarku.edu/~djoyce/java/elements/bookI/bookI.html
+ */
+
+const fs = require('fs');
+const path = require('path');
+
+const PROPOSITIONS = [
+ { n: 1, short: "Equilateral triangle on given line", full: "To construct an equilateral triangle on a given finite straight line" },
+ { n: 2, short: "Place line equal to given at point", full: "To place a straight line equal to a given straight line with one end at a given point" },
+ { n: 3, short: "Cut off from greater segment equal to less", full: "To cut off from the greater of two given unequal straight lines a straight line equal to the less" },
+ { n: 4, short: "SAS congruence", full: "If two triangles have two sides equal to two sides respectively, and the angles contained equal, then bases and remaining angles equal" },
+ { n: 5, short: "Base angles of isosceles equal", full: "In isosceles triangles the angles at the base equal one another" },
+ { n: 6, short: "Sides opposite equal angles equal", full: "If in a triangle two angles equal one another, then the sides opposite the equal angles also equal one another" },
+ { n: 7, short: "Uniqueness of triangle from ends", full: "Given two lines from ends of a line meeting at a point, no other such pair from same ends on same side" },
+ { n: 8, short: "SSS congruence", full: "If two triangles have two sides equal to two sides respectively, and the base equal to the base, then the angles contained are equal" },
+ { n: 9, short: "Bisect angle", full: "To bisect a given rectilinear angle" },
+ { n: 10, short: "Bisect line", full: "To bisect a given finite straight line" },
+ { n: 11, short: "Perpendicular from point on line", full: "To draw a straight line at right angles to a given straight line from a given point on it" },
+ { n: 12, short: "Perpendicular from point not on line", full: "To draw a straight line perpendicular to a given infinite straight line from a given point not on it" },
+ { n: 13, short: "Angles on line sum to two right", full: "If a straight line stands on a straight line, it makes either two right angles or angles whose sum equals two right angles" },
+ { n: 14, short: "If angles sum to two right, straight line", full: "If with any straight line, at a point, two lines not on same side make adjacent angles equal to two right, they are in a straight line" },
+ { n: 15, short: "Vertical angles equal", full: "If two straight lines cut one another, they make the vertical angles equal to one another" },
+ { n: 16, short: "Exterior angle > interior opposite", full: "In any triangle, if one side is produced, the exterior angle is greater than either interior opposite angle" },
+ { n: 17, short: "Sum of two angles < two right", full: "In any triangle the sum of any two angles is less than two right angles" },
+ { n: 18, short: "Angle opposite greater side greater", full: "In any triangle the angle opposite the greater side is greater" },
+ { n: 19, short: "Side opposite greater angle greater", full: "In any triangle the side opposite the greater angle is greater" },
+ { n: 20, short: "Triangle inequality", full: "In any triangle the sum of any two sides is greater than the remaining one" },
+ { n: 21, short: "Lines from ends within triangle", full: "If from ends of one side two lines meet within the triangle, their sum < sum of other two sides" },
+ { n: 22, short: "Construct triangle from three lines", full: "To construct a triangle out of three straight lines which equal three given straight lines" },
+ { n: 23, short: "Construct angle equal to given", full: "To construct a rectilinear angle equal to a given rectilinear angle on a given straight line" },
+ { n: 24, short: "SAS for greater angle => greater base", full: "If two triangles have two sides equal but one contained angle greater, the base is greater" },
+ { n: 25, short: "SAS for greater base => greater angle", full: "If two triangles have two sides equal but base greater, the contained angle is greater" },
+ { n: 26, short: "AAS congruence", full: "If two triangles have two angles equal and one side equal, the remaining sides and angle equal" },
+ { n: 27, short: "Alternate angles equal => parallel", full: "If a line falling on two lines makes alternate angles equal, the lines are parallel" },
+ { n: 28, short: "Exterior = interior opposite => parallel", full: "If exterior angle equals interior opposite, or interior same-side sum to two right, lines parallel" },
+ { n: 29, short: "Parallel => alternate angles equal", full: "A line falling on parallel lines makes alternate angles equal, exterior = interior opposite" },
+ { n: 30, short: "Transitivity of parallel", full: "Straight lines parallel to the same straight line are also parallel to one another" },
+ { n: 31, short: "Draw parallel through point", full: "To draw a straight line through a given point parallel to a given straight line" },
+ { n: 32, short: "Exterior angle = sum interior opposite", full: "In any triangle, exterior angle equals sum of two interior opposite; three angles = two right" },
+ { n: 33, short: "Joining ends of equal parallel lines", full: "Straight lines which join the ends of equal and parallel straight lines in same directions are equal and parallel" },
+ { n: 34, short: "Parallelogram properties", full: "In parallelogrammic areas the opposite sides and angles equal one another, diameter bisects" },
+ { n: 35, short: "Parallelograms same base equal", full: "Parallelograms which are on the same base and in the same parallels equal one another" },
+ { n: 36, short: "Parallelograms equal bases equal", full: "Parallelograms which are on equal bases and in the same parallels equal one another" },
+ { n: 37, short: "Triangles same base equal", full: "Triangles which are on the same base and in the same parallels equal one another" },
+ { n: 38, short: "Triangles equal bases equal", full: "Triangles which are on equal bases and in the same parallels equal one another" },
+ { n: 39, short: "Equal triangles same base same side", full: "Equal triangles on same base and same side are in the same parallels" },
+ { n: 40, short: "Equal triangles equal bases same side", full: "Equal triangles on equal bases and same side are in the same parallels" },
+ { n: 41, short: "Parallelogram = 2× triangle", full: "If a parallelogram has same base with triangle and same parallels, parallelogram is double the triangle" },
+ { n: 42, short: "Construct parallelogram = triangle", full: "To construct a parallelogram equal to a given triangle in a given rectilinear angle" },
+ { n: 43, short: "Complements of parallelogram", full: "In any parallelogram the complements of the parallelograms about the diameter equal one another" },
+ { n: 44, short: "Apply parallelogram to line", full: "To a given straight line in a given angle, to apply a parallelogram equal to a given triangle" },
+ { n: 45, short: "Construct parallelogram = rectilinear figure", full: "To construct a parallelogram equal to a given rectilinear figure in a given rectilinear angle" },
+ { n: 46, short: "Construct square on line", full: "To describe a square on a given straight line" },
+ { n: 47, short: "Pythagorean theorem", full: "In right-angled triangles the square on the side opposite the right angle equals the sum of the squares on the sides containing the right angle" },
+ { n: 48, short: "Converse Pythagorean", full: "If in a triangle the square on one side equals the sum of squares on the other two, the angle contained by those sides is right" }
+];
+
+// Joyce dependency table: { prop: [deps] }
+const DEPS = {
+ 1: ["P1", "P3"],
+ 2: ["Prop1", "P1", "P2", "P3"],
+ 3: ["Prop2", "P3"],
+ 4: ["CN4", "CN5"],
+ 5: ["Prop3", "Prop4"],
+ 6: ["Prop3", "Prop4"],
+ 7: ["Prop5"],
+ 8: ["Prop7"],
+ 9: ["Prop1", "Prop3", "Prop8"],
+ 10: ["Prop1", "Prop4", "Prop9"],
+ 11: ["Prop1", "Prop3", "Prop8"],
+ 12: ["Prop8", "Prop10"],
+ 13: ["Prop11"],
+ 14: ["Prop13"],
+ 15: ["Prop13"],
+ 16: ["Prop3", "Prop4", "Prop10", "Prop15"],
+ 17: ["Prop13", "Prop16"],
+ 18: ["Prop3", "Prop5", "Prop16"],
+ 19: ["Prop5", "Prop18"],
+ 20: ["Prop3", "Prop5", "Prop19"],
+ 21: ["Prop16", "Prop20"],
+ 22: ["Prop3", "Prop20"],
+ 23: ["Prop8", "Prop22"],
+ 24: ["Prop3", "Prop4", "Prop5", "Prop19", "Prop23"],
+ 25: ["Prop4", "Prop24"],
+ 26: ["Prop3", "Prop4", "Prop16"],
+ 27: ["Prop16"],
+ 28: ["Prop13", "Prop15", "Prop27"],
+ 29: ["Prop13", "Prop15", "Prop27", "P5"],
+ 30: ["Prop29"],
+ 31: ["Prop23", "Prop27"],
+ 32: ["Prop13", "Prop29", "Prop31"],
+ 33: ["Prop4", "Prop27", "Prop29"],
+ 34: ["Prop4", "Prop26", "Prop29"],
+ 35: ["Prop4", "Prop29", "Prop34"],
+ 36: ["Prop33", "Prop34", "Prop35"],
+ 37: ["Prop31", "Prop34", "Prop35"],
+ 38: ["Prop31", "Prop34", "Prop36"],
+ 39: ["Prop31", "Prop37"],
+ 40: ["Prop31", "Prop38"],
+ 41: ["Prop34", "Prop37"],
+ 42: ["Prop10", "Prop23", "Prop31", "Prop38", "Prop41"],
+ 43: ["Prop34"],
+ 44: ["Prop15", "Prop29", "Prop31", "Prop42", "Prop43"],
+ 45: ["Prop14", "Prop29", "Prop30", "Prop33", "Prop34", "Prop42", "Prop44"],
+ 46: ["Prop3", "Prop11", "Prop29", "Prop31", "Prop34"],
+ 47: ["Prop4", "Prop14", "Prop31", "Prop41", "Prop46"],
+ 48: ["Prop3", "Prop8", "Prop11", "Prop47"]
+};
+
+const discourse = {
+ schemaVersion: "1.0",
+ discourse: {
+ id: "euclid-elements-book-i",
+ name: "Euclid's Elements, Book I",
+ subject: "geometry",
+ variant: "classical",
+ description: "The 48 propositions of Book I with dependencies on postulates (P1–P5), common notions (CN1–CN5), and prior propositions. Source: David E. Joyce, Clark University.",
+ structure: { books: 1, propositions: 48, foundationTypes: ["postulate", "commonNotion"] }
+ },
+ metadata: {
+ created: "2026-03-15",
+ lastUpdated: "2026-03-15",
+ version: "1.0.0",
+ license: "CC BY 4.0",
+ authors: ["Welz, G."],
+ methodology: "Programming Framework",
+ citation: "Welz, G. (2026). Euclid's Elements Book I Dependency Graph. Programming Framework.",
+ keywords: ["Euclid", "Elements", "Book I", "plane geometry", "constructions", "Pythagorean theorem"]
+ },
+ sources: [
+ { id: "joyce", type: "digital", authors: "Joyce, David E.", title: "Euclid's Elements, Book I", year: "1996", url: "https://mathcs.clarku.edu/~djoyce/java/elements/bookI/bookI.html", notes: "Clark University; dependency table from Guide" },
+ { id: "euclid-heath", type: "primary", authors: "Heath, T.L.", title: "The Thirteen Books of Euclid's Elements", year: "1908", edition: "2nd", publisher: "Cambridge University Press", url: "https://archive.org/details/euclidheath00heatiala", notes: "Standard English translation" }
+ ],
+ nodes: [],
+ edges: [],
+ colorScheme: {
+ postulate: { fill: "#e74c3c", stroke: "#c0392b" },
+ commonNotion: { fill: "#9b59b6", stroke: "#8e44ad" },
+ proposition: { fill: "#1abc9c", stroke: "#16a085" }
+ }
+};
+
+// Add postulates and common notions
+const postulates = [
+ { id: "P1", label: "Draw a straight line from any point to any point", shortLabel: "Post. 1" },
+ { id: "P2", label: "Produce a finite straight line continuously in a straight line", shortLabel: "Post. 2" },
+ { id: "P3", label: "Describe a circle with any center and radius", shortLabel: "Post. 3" },
+ { id: "P4", label: "All right angles equal one another", shortLabel: "Post. 4" },
+ { id: "P5", label: "Parallel postulate: if interior angles < two right, lines meet", shortLabel: "Post. 5" }
+];
+const commonNotions = [
+ { id: "CN1", label: "Things equal to the same thing are equal to each other", shortLabel: "CN 1" },
+ { id: "CN2", label: "If equals are added to equals, the wholes are equal", shortLabel: "CN 2" },
+ { id: "CN3", label: "If equals are subtracted from equals, the remainders are equal", shortLabel: "CN 3" },
+ { id: "CN4", label: "Things coinciding with one another are equal", shortLabel: "CN 4" },
+ { id: "CN5", label: "The whole is greater than the part", shortLabel: "CN 5" }
+];
+
+for (const p of postulates) {
+ discourse.nodes.push({ id: p.id, type: "postulate", label: p.label, shortLabel: p.shortLabel, book: 1, number: parseInt(p.id.slice(1)), colorClass: "postulate" });
+}
+for (const c of commonNotions) {
+ discourse.nodes.push({ id: c.id, type: "commonNotion", label: c.label, shortLabel: c.shortLabel, book: 1, number: parseInt(c.id.slice(2)), colorClass: "commonNotion" });
+}
+
+// Add propositions
+for (const prop of PROPOSITIONS) {
+ discourse.nodes.push({
+ id: `Prop${prop.n}`,
+ type: "proposition",
+ label: prop.full,
+ shortLabel: `Prop. I.${prop.n}`,
+ short: prop.short,
+ book: 1,
+ number: prop.n,
+ colorClass: "proposition"
+ });
+ for (const dep of DEPS[prop.n] || []) {
+ discourse.edges.push({ from: dep, to: `Prop${prop.n}` });
+ }
+}
+
+// Write JSON
+const dataDir = path.join(__dirname, "..", "data");
+const outPath = path.join(dataDir, "euclid-elements-book-i.json");
+fs.mkdirSync(dataDir, { recursive: true });
+fs.writeFileSync(outPath, JSON.stringify(discourse, null, 2), "utf8");
+console.log("Wrote", outPath);
+
+// Generate Mermaid (full graph - may be large)
+function toMermaid(filter) {
+ const nodes = filter ? discourse.nodes.filter(filter) : discourse.nodes;
+ const nodeIds = new Set(nodes.map(n => n.id));
+ const edges = discourse.edges.filter(e => nodeIds.has(e.from) && nodeIds.has(e.to));
+ const lines = ["graph TD"];
+ for (const n of nodes) {
+ const desc = n.short || (n.label.length > 35 ? n.label.slice(0, 32) + "..." : n.label);
+ const lbl = (n.shortLabel || n.id) + "\\n" + desc;
+ lines.push(` ${n.id}["${lbl.replace(/"/g, '\\"')}"]`);
+ }
+ for (const e of edges) {
+ lines.push(` ${e.from} --> ${e.to}`);
+ }
+ lines.push(" classDef postulate fill:#e74c3c,color:#fff,stroke:#c0392b");
+ lines.push(" classDef commonNotion fill:#9b59b6,color:#fff,stroke:#8e44ad");
+ lines.push(" classDef proposition fill:#1abc9c,color:#fff,stroke:#16a085");
+ const postIds = nodes.filter(n => n.type === "postulate").map(n => n.id).join(",");
+ const cnIds = nodes.filter(n => n.type === "commonNotion").map(n => n.id).join(",");
+ const propIds = nodes.filter(n => n.type === "proposition").map(n => n.id).join(",");
+ lines.push(` class ${postIds} postulate`);
+ lines.push(` class ${cnIds} commonNotion`);
+ lines.push(` class ${propIds} proposition`);
+ return lines.join("\n");
+}
+
+// Full graph
+const fullMermaid = toMermaid();
+const mermaidPath = path.join(dataDir, "euclid-elements-book-i.mmd");
+fs.writeFileSync(mermaidPath, fullMermaid, "utf8");
+console.log("Wrote", mermaidPath);
+
+// Subgraphs: include foundations + props in range + all their dependencies (transitive)
+function closure(propMin, propMax) {
+ const needed = new Set();
+ for (let i = propMin; i <= propMax; i++) needed.add(`Prop${i}`);
+ let changed = true;
+ while (changed) {
+ changed = false;
+ for (const e of discourse.edges) {
+ if (needed.has(e.to) && !needed.has(e.from)) { needed.add(e.from); changed = true; }
+ }
+ }
+ return n => n.type !== "proposition" || needed.has(n.id);
+}
+function toMermaidWithCounts(filter) {
+ const nodes = filter ? discourse.nodes.filter(filter) : discourse.nodes;
+ const nodeIds = new Set(nodes.map(n => n.id));
+ const edges = discourse.edges.filter(e => nodeIds.has(e.from) && nodeIds.has(e.to));
+ return { mermaid: toMermaid(filter), nodes: nodes.length, edges: edges.length };
+}
+const subgraphData = [];
+const sections = [
+ { name: "props-1-10", min: 1, max: 10, title: "Propositions 1–10", desc: "Foundations, SAS, SSS, bisections, perpendiculars" },
+ { name: "props-11-20", min: 11, max: 20, title: "Propositions 11–20", desc: "Right angles, straight lines, vertical angles, triangle exterior, triangle inequality" },
+ { name: "props-21-30", min: 21, max: 30, title: "Propositions 21–30", desc: "Lines within triangle, construct triangle/angle, parallel lines" },
+ { name: "props-31-41", min: 31, max: 41, title: "Propositions 31–41", desc: "Parallelograms, triangles, areas" },
+ { name: "props-42-48", min: 42, max: 48, title: "Propositions 42–48", desc: "Constructions, Pythagorean theorem" }
+];
+for (const s of sections) {
+ const { mermaid: sub, nodes: n, edges: e } = toMermaidWithCounts(closure(s.min, s.max));
+ subgraphData.push({ ...s, mermaid: sub, nodes: n, edges: e });
+ const subPath = path.join(dataDir, `euclid-elements-book-i-${s.name}.mmd`);
+ fs.writeFileSync(subPath, sub, "utf8");
+ console.log("Wrote", subPath);
+}
+
+// Generate HTML pages for Mathematics Processes Database
+const MATH_DB = process.env.MATH_DB || "/home/gdubs/copernicus-web-public/huggingface-space/mathematics-processes-database";
+const GEOM_DIR = path.join(MATH_DB, "processes", "geometry_topology");
+
+function htmlTemplate(title, subtitle, mermaid, nodes, edges) {
+ const mermaidEscaped = mermaid.replace(//g, ">");
+ return `
+
+
+
+
+ ${title} - Mathematics Process
+
+
+
+
+
Note: Arrows mean "depends on" (tail → head). Edge crossing can create illusions of connections between adjacent nodes—e.g., I.47 and I.40 have no direct dependency; both depend on I.31 among others.
+
${mermaidEscaped}
+
+
+
Color Scheme
+
+
Red Postulates
+
Purple Common Notions
+
Teal Propositions
+
+
+
+
+
Statistics
+
+
Nodes: ${nodes}
+
Edges: ${edges}
+
+
+
+
Keywords
+
+
Euclid
Elements
Book I
axioms
postulates
propositions
geometry
Pythagorean theorem
+
+
+
+
+
+
+
+`;
+}
+
+if (fs.existsSync(GEOM_DIR)) {
+ for (const d of subgraphData) {
+ const html = htmlTemplate(
+ `Euclid's Elements Book I — ${d.title}`,
+ `Dependency graph for ${d.title} of Euclid's Elements Book I. ${d.desc}. Shows how propositions depend on postulates (P1–P5), common notions (CN1–CN5), and prior propositions.`,
+ d.mermaid,
+ d.nodes,
+ d.edges
+ );
+ const fileName = "geometry_topology-euclid-elements-book-i-" + d.name;
+ fs.writeFileSync(path.join(GEOM_DIR, fileName + ".html"), html, "utf8");
+ console.log("Wrote", path.join(GEOM_DIR, fileName + ".html"));
+ }
+ // Index page
+ const indexHtml = `
+
+
+
+
+ Euclid's Elements Book I - Mathematics Process
+
+
+
+
Theory of circles: 11 definitions, 37 propositions. Chords, tangents, angles at center and circumference. All depend on Book I. Prop III.35 uses Prop II.5. Dependency charts split into three views (~16 nodes each).
Inscribed and circumscribed figures: triangle, square, pentagon, hexagon, 15-gon. All depend on Books I and III. Prop IV.10 uses Prop II.11 (golden section). Dependency charts split into two views.
Similar figures. 4 definitions, 33 propositions. Depends on Book I and Book V. VI.1 (triangles under same height) is basis for most. VI.33 uses proportion def directly.
+
+`;
+ fs.writeFileSync(path.join(GEOM_DIR, "geometry_topology-euclid-elements-book-vi.html"), indexHtml, "utf8");
+ console.log("Wrote geometry_topology-euclid-elements-book-vi.html");
+}
+
+console.log("Done. Nodes:", discourse.nodes.length, "Edges:", discourse.edges.length);
diff --git a/generator/build-euclid-book-vii.js b/generator/build-euclid-book-vii.js
new file mode 100644
index 0000000000000000000000000000000000000000..e5df0766ad186ab864ad42c84dc7ebe315f28254
--- /dev/null
+++ b/generator/build-euclid-book-vii.js
@@ -0,0 +1,399 @@
+#!/usr/bin/env node
+/**
+ * Build Euclid's Elements Book VII discourse JSON and Mermaid charts.
+ * 22 definitions, 39 propositions. Number theory: GCD, proportions, primes, LCM.
+ * Book VII does not depend on previous books. Source: David E. Joyce.
+ *
+ * Charts: 4. Chart 1: Defs + Props 1-10. Chart 2: Props 11-20. Chart 3: Props 21-30. Chart 4: Props 31-39.
+ */
+
+const fs = require('fs');
+const path = require('path');
+
+const DEFS = [
+ { n: 1, short: "Unit", full: "A unit is that by virtue of which each of the things that exist is called one" },
+ { n: 2, short: "Number", full: "A number is a multitude composed of units" },
+ { n: 3, short: "Part", full: "A number is part of a number when it measures it" },
+ { n: 4, short: "Parts", full: "Parts when it does not measure it" },
+ { n: 5, short: "Multiple", full: "The greater is a multiple of the less when measured by the less" },
+ { n: 6, short: "Even", full: "An even number is that which is divisible into two equal parts" },
+ { n: 7, short: "Odd", full: "An odd number is that which is not divisible into two equal parts" },
+ { n: 8, short: "Even-times even", full: "Even-times even: measured by an even number an even number of times" },
+ { n: 9, short: "Even-times odd", full: "Even-times odd: measured by an even number an odd number of times" },
+ { n: 10, short: "Odd-times odd", full: "Odd-times odd: measured by an odd number an odd number of times" },
+ { n: 11, short: "Prime", full: "A prime number is that which is measured by a unit alone" },
+ { n: 12, short: "Relatively prime", full: "Numbers relatively prime when only a unit measures both" },
+ { n: 13, short: "Composite", full: "A composite number is that measured by some number" },
+ { n: 14, short: "Composite to one another", full: "Numbers composite to one another when some number measures both" },
+ { n: 15, short: "Multiply", full: "A number multiplies a number when the latter is added as many times as units in the former" },
+ { n: 16, short: "Product", full: "When two numbers multiplied produce a number, the product is plane" },
+ { n: 17, short: "Side", full: "Sides of the product are the numbers multiplied" },
+ { n: 18, short: "Plane number", full: "A plane number is that produced by two numbers" },
+ { n: 19, short: "Solid number", full: "A solid number is that produced by three numbers" },
+ { n: 20, short: "Similar plane", full: "Similar plane numbers have sides proportional" },
+ { n: 21, short: "Similar solid", full: "Similar solid numbers have sides proportional" },
+ { n: 22, short: "Perfect", full: "A perfect number is that which equals its own parts" }
+];
+
+const PROPS = [
+ { n: 1, short: "Antenaresis, relatively prime", full: "Unequal numbers: repeated subtraction; if unit left, relatively prime" },
+ { n: 2, short: "GCD of two numbers", full: "To find greatest common measure of two numbers not relatively prime" },
+ { n: 3, short: "GCD of three numbers", full: "To find greatest common measure of three numbers" },
+ { n: 4, short: "Part or parts", full: "Any number is part or parts of any number, less of greater" },
+ { n: 5, short: "Same part: sum", full: "If a is same part of b as c of d, then a+c same part of b+d" },
+ { n: 6, short: "Same parts: sum", full: "If a is same parts of b as c of d, then a+c same parts of b+d" },
+ { n: 7, short: "Same part: remainder", full: "If a part of b as c of d, remainder same part of remainder" },
+ { n: 8, short: "Same parts: remainder", full: "If a parts of b as c of d, remainder same parts of remainder" },
+ { n: 9, short: "Same part: alternately", full: "If a part of b as c of d, alternately a part/parts of c as b of d" },
+ { n: 10, short: "Same parts: alternately", full: "If a parts of b as c of d, alternately a part/parts of c as b of d" },
+ { n: 11, short: "Proportion: remainder", full: "If whole:whole as subtracted:subtracted, remainder:remainder as whole:whole" },
+ { n: 12, short: "Proportional: sum", full: "Proportional: one antecedent to consequent as sum antecedents to sum consequents" },
+ { n: 13, short: "Proportional: alternately", full: "If four numbers proportional, also proportional alternately" },
+ { n: 14, short: "Ex aequali", full: "If a:b = d:e and b:c = e:f, then a:c = d:f" },
+ { n: 15, short: "Unit measures", full: "If unit measures a, b measures c same times, alternately unit:c as b:d" },
+ { n: 16, short: "Commutativity of product", full: "If a×b and c×d, then a×b = c×d (commutativity)" },
+ { n: 17, short: "Ratio of products", full: "a:b = (a×c):(b×c)" },
+ { n: 18, short: "Ratio: multipliers", full: "a×c : b×c = a:b" },
+ { n: 19, short: "Proportional iff product", full: "a:b = c:d iff a×d = b×c" },
+ { n: 20, short: "Least in ratio", full: "Least numbers in ratio measure others same number of times" },
+ { n: 21, short: "Relatively prime: least", full: "Relatively prime numbers are least in their ratio" },
+ { n: 22, short: "Least: relatively prime", full: "Least numbers in ratio are relatively prime" },
+ { n: 23, short: "Relatively prime: divisor", full: "If a,b relatively prime, divisor of a relatively prime to b" },
+ { n: 24, short: "Product relatively prime", full: "If a,b relatively prime to c, then a×b relatively prime to c" },
+ { n: 25, short: "Square relatively prime", full: "If a,b relatively prime, a² relatively prime to b" },
+ { n: 26, short: "Products relatively prime", full: "If a,c and b,d relatively prime, a×b, c×d relatively prime" },
+ { n: 27, short: "Squares relatively prime", full: "If a,b relatively prime, a²,b² relatively prime; a×a², b×b²" },
+ { n: 28, short: "Sum relatively prime", full: "If a,b relatively prime, a+b prime to each; converse" },
+ { n: 29, short: "Prime to non-multiple", full: "Prime relatively prime to any number it does not measure" },
+ { n: 30, short: "Prime divides product", full: "If prime measures product, it measures one factor" },
+ { n: 31, short: "Composite has prime factor", full: "Any composite measured by some prime" },
+ { n: 32, short: "Prime or has prime factor", full: "Any number is prime or measured by some prime" },
+ { n: 33, short: "Least in ratio", full: "Given numbers, find least in same ratio" },
+ { n: 34, short: "LCM of two", full: "To find least number that two given numbers measure" },
+ { n: 35, short: "LCM divides common multiple", full: "If two numbers measure some number, LCM also measures it" },
+ { n: 36, short: "LCM of three", full: "To find least number that three given numbers measure" },
+ { n: 37, short: "Measured has part", full: "If a measures b, b has part named by a" },
+ { n: 38, short: "Part implies measured", full: "If b has part named by a, a measures b" },
+ { n: 39, short: "Least with given parts", full: "To find least number with given parts" }
+];
+
+// Joyce: VII.1→2,3; VII.2→3; VII.5-10 fractions; VII.11-19 proportions; VII.20-29 relatively prime; VII.30-32 primes; VII.33-39 LCM
+const DEPS = {
+ 1: [],
+ 2: ["Prop1"],
+ 3: ["Prop1", "Prop2"],
+ 4: [],
+ 5: [],
+ 6: [],
+ 7: [],
+ 8: [],
+ 9: [],
+ 10: [],
+ 11: [],
+ 12: [],
+ 13: [],
+ 14: [],
+ 15: [],
+ 16: [],
+ 17: [],
+ 18: [],
+ 19: [],
+ 20: ["Prop19"],
+ 21: ["Prop20"],
+ 22: ["Prop21"],
+ 23: ["Prop22"],
+ 24: ["Prop23"],
+ 25: ["Prop23"],
+ 26: ["Prop24"],
+ 27: ["Prop25"],
+ 28: ["Prop23"],
+ 29: ["Prop23"],
+ 30: ["Prop29"],
+ 31: [],
+ 32: ["Prop31"],
+ 33: ["Prop20", "Prop22"],
+ 34: ["Prop33"],
+ 35: ["Prop34"],
+ 36: ["Prop34"],
+ 37: [],
+ 38: ["Prop37"],
+ 39: ["Prop38"]
+};
+
+const discourse = {
+ schemaVersion: "1.0",
+ discourse: {
+ id: "euclid-elements-book-vii",
+ name: "Euclid's Elements, Book VII",
+ subject: "number_theory",
+ variant: "classical",
+ description: "Number theory: GCD (Euclidean algorithm), proportions, primes, LCM. 22 definitions, 39 propositions. Does not depend on previous books. Source: David E. Joyce.",
+ structure: { books: 7, definitions: 22, propositions: 39, foundationTypes: ["definition"] }
+ },
+ metadata: {
+ created: "2026-03-18",
+ lastUpdated: "2026-03-18",
+ version: "1.0.0",
+ license: "CC BY 4.0",
+ authors: ["Welz, G."],
+ methodology: "Programming Framework",
+ citation: "Welz, G. (2026). Euclid's Elements Book VII Dependency Graph. Programming Framework.",
+ keywords: ["Euclid", "Elements", "Book VII", "number theory", "GCD", "prime", "LCM"]
+ },
+ sources: [
+ { id: "joyce", type: "digital", authors: "Joyce, David E.", title: "Euclid's Elements, Book VII", year: "1996", url: "https://mathcs.clarku.edu/~djoyce/java/elements/bookVII/bookVII.html", notes: "Clark University" }
+ ],
+ nodes: [],
+ edges: [],
+ colorScheme: {
+ definition: { fill: "#3498db", stroke: "#2980b9" },
+ proposition: { fill: "#1abc9c", stroke: "#16a085" }
+ }
+};
+
+for (const d of DEFS) {
+ discourse.nodes.push({
+ id: `Def${d.n}`,
+ type: "definition",
+ label: d.full,
+ shortLabel: `Def. VII.${d.n}`,
+ short: d.short,
+ book: 7,
+ number: d.n,
+ colorClass: "definition"
+ });
+}
+
+for (const prop of PROPS) {
+ discourse.nodes.push({
+ id: `Prop${prop.n}`,
+ type: "proposition",
+ label: prop.full,
+ shortLabel: `Prop. VII.${prop.n}`,
+ short: prop.short,
+ book: 7,
+ number: prop.n,
+ colorClass: "proposition"
+ });
+ for (const dep of DEPS[prop.n] || []) {
+ discourse.edges.push({ from: dep, to: `Prop${prop.n}` });
+ }
+}
+
+const dataDir = path.join(__dirname, "..", "data");
+fs.mkdirSync(dataDir, { recursive: true });
+fs.writeFileSync(path.join(dataDir, "euclid-elements-book-vii.json"), JSON.stringify(discourse, null, 2), "utf8");
+console.log("Wrote euclid-elements-book-vii.json");
+
+function toMermaid(filter) {
+ const nodes = filter ? discourse.nodes.filter(filter) : discourse.nodes;
+ const nodeIds = new Set(nodes.map(n => n.id));
+ const edges = discourse.edges.filter(e => nodeIds.has(e.from) && nodeIds.has(e.to));
+ const lines = ["graph TD"];
+ for (const n of nodes) {
+ const desc = n.short || (n.label && n.label.length > 35 ? n.label.slice(0, 32) + "..." : n.label || n.id);
+ const lbl = (n.shortLabel || n.id) + "\\n" + (desc || "");
+ lines.push(` ${n.id}["${String(lbl).replace(/"/g, '\\"')}"]`);
+ }
+ for (const e of edges) {
+ lines.push(` ${e.from} --> ${e.to}`);
+ }
+ lines.push(" classDef definition fill:#3498db,color:#fff,stroke:#2980b9");
+ lines.push(" classDef proposition fill:#1abc9c,color:#fff,stroke:#16a085");
+ const defIds = nodes.filter(n => n.type === "definition").map(n => n.id).join(",");
+ const propIds = nodes.filter(n => n.type === "proposition").map(n => n.id).join(",");
+ lines.push(` class ${defIds} definition`);
+ lines.push(` class ${propIds} proposition`);
+ return lines.join("\n");
+}
+
+function closure(propMax) {
+ const needed = new Set();
+ for (let i = 1; i <= propMax; i++) needed.add(`Prop${i}`);
+ for (const d of DEFS) needed.add(`Def${d.n}`);
+ let changed = true;
+ while (changed) {
+ changed = false;
+ for (const e of discourse.edges) {
+ if (needed.has(e.to) && !needed.has(e.from)) { needed.add(e.from); changed = true; }
+ }
+ }
+ return n => needed.has(n.id);
+}
+
+const MATH_DB = process.env.MATH_DB || "/home/gdubs/copernicus-web-public/huggingface-space/mathematics-processes-database";
+const GEOM_DIR = path.join(MATH_DB, "processes", "geometry_topology");
+
+function htmlTemplate(title, subtitle, mermaid, nodes, edges) {
+ const mermaidEscaped = mermaid.replace(//g, ">");
+ return `
+
+
+
+
+ ${title} - Mathematics Process
+
+
+
+
+
+
+
${title}
+
+ Mathematics
+ Number Theory
+ Source: Euclid's Elements
+
Measurement of figures: circles (XII.2), pyramids (XII.5–7), cones and cylinders (XII.10–15), spheres (XII.18). 18 propositions. Depends on Books I, V, VI, XI.
Regular solids: tetrahedron, octahedron, cube, icosahedron, dodecahedron. 18 propositions. Depends on Books I, IV, VI, X, XI. XIII.18: no other such figure exists.
Axiomatic development of natural number arithmetic. Five axioms, definitions of addition and multiplication, and key theorems. Based on Landau, Foundations of Analysis. Split into three views.
Hilbert-style axiomatic development of classical propositional logic. Three axioms (Łukasiewicz P2), modus ponens, definitions of disjunction, conjunction, biconditional, and key theorems. Split into three views.
- Combining Large Language Models with Mermaid visualization to dissect and understand
- complex processes across any discipline—from biology to business, physics to psychology.
-
-
-
-
-
-
-
-
-
📋 Summary
-
- The Programming Framework is a universal meta-tool for analyzing complex processes across any discipline by combining Large Language Models (LLMs) with visual flowchart representation. The Framework transforms textual process descriptions into structured, interactive Mermaid flowcharts stored as JSON, enabling systematic analysis, visualization, and integration with knowledge systems.
-
-
- Successfully demonstrated through GLMP (Genome Logic Modeling Project) with 50+ biological processes, and applied across Chemistry, Mathematics, Physics, and Computer Science. The Framework serves as the foundational methodology for the CopernicusAI Knowledge Engine, enabling domain-specific process visualization and analysis.
-
-
-
-
-
-
-
📚 Prior Work & Research Contributions
-
-
-
Overview
-
- The Programming Framework represents prior work that demonstrates a novel methodology for analyzing complex processes by combining Large Language Models (LLMs) with visual flowchart representation. This research establishes a universal, domain-agnostic approach to process analysis that transforms textual descriptions into structured, interactive visualizations.
-
- The Programming Framework serves as the foundational meta-tool of the CopernicusAI Knowledge Engine, providing the underlying methodology that enables specialized applications:
-
-
-
-
• GLMP (Genome Logic Modeling Project)
-
• CopernicusAI (main knowledge engine)
-
• Research Papers Metadata Database
-
-
-
• Science Video Database
-
• Multi-domain process analysis
-
-
-
- This work establishes a proof-of-concept for AI-assisted process analysis, demonstrating how LLMs can systematically extract and visualize complex logic from textual sources across diverse domains.
-
-
-
-
-
-
-
-
-
-
Any
-
Discipline
-
-
-
LLM
-
Powered
-
-
-
Visual
-
Flowcharts
-
-
-
JSON
-
Structured Data
-
-
-
-
-
-
-
-
🎯 What is the Programming Framework?
-
-
- The Programming Framework is a meta-tool—a tool for creating tools. It provides a
- systematic method for analyzing any complex process by combining the analytical power of Large Language
- Models with the clarity of visual flowcharts.
-
-
-
-
-
🔍 The Problem
-
- Complex processes—whether biological, computational, or organizational—are difficult to
- understand because they involve many steps, decision points, and interactions. Traditional
- descriptions in text are hard to follow.
-
- Use LLMs to extract process logic from literature, then encode it as Mermaid flowcharts
- stored in JSON. Result: Clear, interactive visualizations that reveal hidden patterns and
- enable systematic analysis.
-
Interactive flowchart reveals insights and enables refinement
-
-
+ .color-system {
+ margin-bottom: 40px;
+ }
-
-
📝 Concrete Example:
-
-
Input:
-
- "DNA replication begins when the origin recognition complex (ORC) binds to DNA replication origins. This triggers the loading of the MCM2-7 helicase complex, which unwinds the DNA double helix. DNA polymerases then synthesize new strands using the unwound strands as templates..."
-
- Mermaid flowchart with 25 nodes, 28 edges, 3 decision gates, properly colored using the 5-color scheme (red for inputs, yellow for structures, green for operations, blue for intermediates, violet for products), stored as structured JSON enabling interactive visualization and programmatic access.
-
- graph TD
- A[Complex Process Input] --> B{LLM Analysis}
- B -->|Extract Logic| C[Identify Steps]
- B -->|Extract Decisions| D[Identify Branches]
- C --> E[Create Flowchart Nodes]
- D --> F[Create Decision Points]
- E --> G[Generate Mermaid Syntax]
- F --> G
- G --> H[Store as JSON]
- H --> I[Interactive Visualization]
- I --> J{Insights Gained?}
- J -->|No| K[Refine Analysis]
- J -->|Yes| L[Apply Knowledge]
- K --> B
-
- style A fill:#ff6b6b,color:#fff
- style B fill:#74c0fc,color:#fff
- style C fill:#51cf66,color:#fff
- style D fill:#51cf66,color:#fff
- style E fill:#ffd43b,color:#000
- style F fill:#ffd43b,color:#000
- style G fill:#51cf66,color:#fff
- style H fill:#74c0fc,color:#fff
- style I fill:#74c0fc,color:#fff
- style J fill:#74c0fc,color:#fff
- style K fill:#51cf66,color:#fff
- style L fill:#b197fc,color:#fff
-
-
-
Color Legend:
-
- Red - Triggers & Inputs
- Yellow - Structures & Objects
- Green - Processing & Operations
- Blue - Intermediates & States
- Violet - Products & Outputs
-
-
-
-
-
-
-
-
-
💡 Core Principles
-
-
-
-
🌍
-
Domain Agnostic
-
- Works across any field: biology, chemistry, software engineering, business processes,
- legal workflows, manufacturing, and beyond.
-
Systematic Analysis of Complex Systems Across Disciplines
+
-
-
💾 Data Storage
-
-
• Google Cloud Storage for JSON files
-
• Firestore for metadata indexing
-
• Version control with Git
-
• Cross-referencing with papers database
-
-
+
+
+
Project Overview
+
The Programming Framework is a systematic visualization methodology for analyzing complex systems across disciplines using Mermaid Markdown and a universal five-color code.
+
+
Complex systems across biology, chemistry, and physics exhibit remarkable similarities in their organizational principles despite operating at vastly different scales and domains. Traditional analysis methods often remain siloed within specific disciplines, limiting our ability to identify common patterns and computational logic that govern system behavior.
+
+
Here, we present the Programming Framework, a systematic methodology that translates complex system dynamics into standardized computational representations using Mermaid Markdown syntax and LLM processing.
+
-
-
🔗 Integration Points
-
-
• GLMP specialized collections
-
• CopernicusAI knowledge graph
-
• Research papers database
-
• API endpoints for programmatic access
-
-
+
+
Technical Foundation: Mermaid Markdown
+
+
The Invention of Mermaid
+
Knut Sveidqvist invented the Mermaid markdown format. He created Mermaid, a JavaScript-based diagramming and charting tool, to simplify diagram creation in documentation workflows. The project was inspired by his experience trying to update a diagram in a document, which was difficult due to the file format.
+
+
Sveidqvist's innovation revolutionized how diagrams are created and maintained in documentation by providing a text-based syntax that can be version-controlled, easily edited, and automatically rendered into visual diagrams. This approach eliminates the need for external diagramming tools and ensures diagrams stay synchronized with their documentation.
+
+
Mermaid Markdown (.mmd) Format
+
The Programming Framework leverages Mermaid's .mmd file format, which provides:
+
+
Text-based syntax for creating complex flowcharts and diagrams
+
Version control compatibility - diagrams can be tracked in Git repositories
+
LLM-friendly format - AI systems can generate and modify diagram code
+
Cross-platform compatibility - works in any environment that supports JavaScript
+
Embeddable rendering - diagrams can be displayed in HTML, Markdown, and other formats
Apply color coding - Systematic application of the 5-category color system
+
Ensure consistency - Standardized node naming and connection patterns
+
Embed in HTML - .mmd files are embedded in HTML for web display
+
Maintain quality - LLMs can validate and optimize diagram structure
+
+
+
This workflow enables rapid creation of sophisticated visualizations that would be impractical to create manually, while maintaining the flexibility and editability of text-based formats.
-
-
-
-
-
-
✅ Validation & Accuracy
-
-
-
-
🔍 Quality Assurance Process
-
-
• Automated Validation: All flowcharts validated for Mermaid syntax correctness before publication
Comprehensive biological systems analysis with genome logic modeling and metabolic pathway visualization
-
-
⚠️ Known Limitations
-
-
• LLM-Dependent Accuracy: Flowchart accuracy depends on LLM interpretation of source material; complex processes may require multiple refinement cycles
-
• Domain Expertise Required: While the Framework is domain-agnostic, optimal results benefit from domain-specific knowledge for validation
-
• Source Material Quality: Accuracy is limited by the quality and completeness of input source material
-
• Continuous Improvement: Framework is actively refined based on user feedback and validation results
-
-
-
-
-
-
-
-
🔗 Related Projects
-
-
-
-
🧬 GLMP - Genome Logic Modeling
-
- First specialized application of the Programming Framework to biochemical processes.
- 100+ biological pathways visualized as interactive flowcharts.
-
Live searchable tables on cloud storage where deployed; other disciplines use static batch pages while the database spine is built out.
+
+
+
🧬 Biology
+
Biological process visualizations: GLMP covers biochemical and molecular processes; the Biology Database covers higher-level organismal pathways, mechanisms, and lab-style protocols.
@misc{welz2025programmingframework,
- title={The Programming Framework: A Universal Method for Process Analysis},
- author={Welz, Gary},
- year={2024--2025},
- url={https://huggingface.co/spaces/garywelz/programming_framework},
- note={Hugging Face Spaces}
-}
+
+
⚗️ Chemistry
+ Under development
+
Comprehensive chemistry process diagrams across major branches. A public interactive database table (like Mathematics and Biology) is not live yet; batch HTML pages in this Space are the current entry point.
Algorithms, proof methods, dependency graphs, and computational processes — main table plus named collections (mathematicians, theorems) and a whole-of-mathematics graph view.
216 processes across 23 subcategories (see live table for the exact index)
+
+
+
+
⚛️ Physics
+ Under development
+
Physical processes including quantum mechanics, thermodynamics, and particle physics. Interactive cloud table coming; use static physics batches for now.
- This project serves as a foundational meta-tool for AI-assisted process analysis, enabling systematic extraction and visualization of complex logic from textual sources across diverse scientific and technical domains.
-
-
- The Programming Framework is designed as infrastructure for AI-assisted science, providing a universal methodology that can be specialized for domain-specific applications.
-
+
+
+
Key Resources & Documentation
+
+
+
📚 Complete Documentation
+
Detailed methodology, implementation guidelines, and theoretical foundation