ruslanmv commited on
Commit
fed7eb0
·
0 Parent(s):

First commit

Browse files
.gitignore ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python artifacts
2
+ __pycache__/
3
+ *.pyc
4
+ *.pyo
5
+ *.pyd
6
+ .Python
7
+ build/
8
+ dist/
9
+ *.egg-info/
10
+
11
+ # Virtual environments
12
+ .venv/
13
+ venv/
14
+ ENV/
15
+
16
+ # Environment files
17
+ .env
18
+
19
+ # Test & coverage reports
20
+ .cache/
21
+ .pytest_cache/
22
+ .mypy_cache/
23
+ htmlcov/
24
+ .coverage
25
+
26
+ # IDE & OS files
27
+ .idea/
28
+ .vscode/
29
+ .DS_Store
30
+ Thumbs.db
31
+
32
+ # RAG index files
33
+ .faiss/
BUILD_INFO.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # Build Information
2
+
3
+ This project artifact was generated on **Friday, September 26, 2025**.
4
+
5
+ - **Timestamp**: `2025-09-26T00:59:00Z`
6
+ - **Location**: `Genoa, Liguria, Italy`
Dockerfile ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ ENV PYTHONDONTWRITEBYTECODE=1 \
6
+ PYTHONUNBUFFERED=1 \
7
+ PIP_NO_CACHE_DIR=1
8
+
9
+ RUN apt-get update && apt-get install -y --no-install-recommends build-essential && \
10
+ rm -rf /var/lib/apt/lists/*
11
+
12
+ COPY requirements.txt /app/requirements.txt
13
+ RUN pip install --no-cache-dir -r /app/requirements.txt
14
+
15
+ COPY . /app
16
+
17
+ EXPOSE 7860
18
+
19
+ CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
LICENSE ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
README.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # matrix-ai
2
+
3
+ **matrix-ai** is the AI planning microservice for the Matrix EcoSystem. It generates **short, low‑risk, auditable remediation plans** from a compact health context provided by **Matrix Guardian**. The service is designed for **Hugging Face Spaces** or **Inference Endpoints**, but also runs locally.
4
+
5
+ > **Endpoints**
6
+ >
7
+ > * `POST /v1/plan` – internal API for Matrix Guardian: returns a safe JSON plan.
8
+ > * `POST /v1/chat` – (optional) RAG-style Q&A about MatrixHub (kept lightweight in Stage‑1).
9
+
10
+ The service emphasizes **safety, performance, and auditability**:
11
+
12
+ * Strict, schema‑validated JSON plans (bounded steps, risk label, rationale)
13
+ * PII redaction before calling upstream model endpoints
14
+ * Exponential backoff, short timeouts, and structured JSON logs
15
+ * In‑memory rate limiting (per‑IP), optional auth for private deployments
16
+ * ETag support and response caching for non‑mutating reads
17
+
18
+ *Last Updated: 2025‑09‑27 (UTC)*
19
+
20
+ ---
21
+
22
+ ## Architecture (at a glance)
23
+
24
+ ```mermaid
25
+ flowchart LR
26
+ subgraph Client[Matrix Operators / Observers]
27
+ end
28
+
29
+ Client -->|monitor| HubAPI[Matrix‑Hub API]
30
+ Guardian[Matrix‑Guardian
31
+ control plane] -->|/v1/plan| AI[matrix‑ai
32
+ HF Space]
33
+ Guardian -->|/status,/apps,...| HubAPI
34
+ HubAPI <-->|SQL| DB[(MatrixDB
35
+ Postgres)]
36
+
37
+ AI -->|HF Inference| HF[Hugging Face
38
+ Inference API]
39
+
40
+ classDef svc fill:#0ea5e9,stroke:#0b4,stroke-width:1,color:#fff
41
+ classDef db fill:#f59e0b,stroke:#0b4,stroke-width:1,color:#fff
42
+ class Guardian,AI,HubAPI svc
43
+ class DB db
44
+ ```
45
+
46
+ ### Sequence: `POST /v1/plan`
47
+
48
+ ```mermaid
49
+ sequenceDiagram
50
+ participant G as Matrix‑Guardian
51
+ participant A as matrix‑ai
52
+ participant H as HF Inference
53
+
54
+ G->>A: POST /v1/plan { context, constraints }
55
+ A->>A: redact PII, validate payload
56
+ A->>H: model.generate prompt [retries, timeout]
57
+ H-->>A: model output text
58
+ A->>A: parse → strict JSON plan fallback if needed
59
+ A-->>G: 200 { plan_id, steps[], risk, explanation }
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Quick Start (Local Development)
65
+ ```bash
66
+ # 1) Create venv
67
+ python3 -m venv .venv
68
+ source .venv/bin/activate
69
+
70
+ # 2) Install deps
71
+ pip install -r requirements.txt
72
+
73
+ # 3) Configure env (local only; use Space Secrets in prod)
74
+ export HF_TOKEN="your_hugging_face_token"
75
+
76
+ # 4) Run
77
+ uvicorn app.main:app --host 0.0.0.0 --port 7860
78
+ ```
79
+
80
+ OpenAPI docs: http://localhost:7860/docs
81
+
82
+ ---
83
+
84
+ ## Deploy to Hugging Face Spaces
85
+
86
+ 1) Push the repository to a new Space.
87
+ 2) In **Settings → Secrets**, add:
88
+ * `HF_TOKEN` (required) – used by the upstream HF Inference client
89
+ * `ADMIN_TOKEN` (optional) – if set, private‑gates `/v1/plan` and `/v1/chat`
90
+ 3) Choose hardware. CPU is fine for tests; GPU recommended for larger models.
91
+ 4) The Space will serve FastAPI on the default port; the two endpoints are ready.
92
+
93
+ > For Inference Endpoints, mirror the same env and start command.
94
+
95
+ ---
96
+
97
+ ## Configuration
98
+
99
+ All options can be set via environment variables (Space Secrets in HF) or `.env` for local use.
100
+
101
+ | Variable | Default | Purpose |
102
+ |---|---:|---|
103
+ | `HF_TOKEN` | — | Token for Hugging Face Inference API (required) |
104
+ | `MODEL_NAME` | `meta-llama/Meta-Llama-3.1-8B-Instruct` | Upstream model ID (example) |
105
+ | `MAX_NEW_TOKENS` | `256` | Output token cap for plan generations |
106
+ | `TEMPERATURE` | `0.2` | Generation temperature |
107
+ | `RATE_LIMIT_PER_MIN` | `120` | Per‑IP fixed‑window limit |
108
+ | `REQUEST_TIMEOUT_SEC` | `15` | HTTP client timeout to HF |
109
+ | `RETRY_MAX_ATTEMPTS` | `3` | Retry budget to HF |
110
+ | `CACHE_TTL_SEC` | `30` | Optional in‑memory caching for GET |
111
+ | `ADMIN_TOKEN` | — | If set, requires `Authorization: Bearer <ADMIN_TOKEN>` |
112
+ | `LOG_LEVEL` | `INFO` | Log level (JSON logs) |
113
+
114
+ > Names are illustrative; keep them in sync with your `configs/settings.yaml` if present.
115
+
116
+ ---
117
+
118
+ ## API
119
+
120
+ ### `POST /v1/plan`
121
+
122
+ **Description:** Generate a short, low‑risk remediation plan from a compact app health context.
123
+
124
+ **Headers**
125
+
126
+ ```
127
+ Content-Type: application/json
128
+ Authorization: Bearer <ADMIN_TOKEN> # required iff ADMIN_TOKEN set
129
+ ```
130
+
131
+ **Request body (example)**
132
+
133
+ ```json
134
+ {
135
+ "context": {
136
+ "entity_uid": "matrix-ai",
137
+ "health": {"score": 0.64, "status": "degraded", "last_checked": "2025-09-27T00:00:00Z"},
138
+ "recent_checks": [
139
+ {"check": "http", "result": "fail", "latency_ms": 900, "ts": "2025-09-27T00:00:00Z"}
140
+ ]
141
+ },
142
+ "constraints": {"max_steps": 3, "risk": "low"}
143
+ }
144
+ ```
145
+
146
+ **Response (example)**
147
+
148
+ ```json
149
+ {
150
+ "plan_id": "pln_01J9YX2H6ZP9R2K9THT2J9F7G4",
151
+ "risk": "low",
152
+ "steps": [
153
+ {"action": "reprobe", "target": "https://service/health", "retries": 2},
154
+ {"action": "pin_lkg", "entity_uid": "matrix-ai"}
155
+ ],
156
+ "explanation": "Transient HTTP failures observed; re-probe and pin to last-known-good if still failing."
157
+ }
158
+ ```
159
+
160
+ **Status codes**
161
+ * `200` – plan generated
162
+ * `400` – invalid payload (schema)
163
+ * `401/403` – missing/invalid bearer (only if `ADMIN_TOKEN` configured)
164
+ * `429` – rate limited
165
+ * `502` – upstream model error after retries
166
+
167
+ ### `POST /v1/chat`
168
+
169
+ *Optional, Stage‑1 placeholder.* Given a query about MatrixHub, returns an answer with citations if a local KB is configured.
170
+
171
+ ---
172
+
173
+ ## Safety & Reliability
174
+
175
+ * **PII redaction** – tokens/emails removed from prompts as a pre‑filter
176
+ * **Strict schema** – JSON plan parsing with fallbacks; rejects unsafe shapes
177
+ * **Time‑boxed** – short timeouts and bounded retries to HF Inference
178
+ * **Rate‑limited** – per‑IP fixed window (configurable)
179
+ * **Structured logs** – JSON logs only; no sensitive payloads are logged
180
+
181
+ ---
182
+
183
+ ## Observability
184
+
185
+ * Request IDs (correlated across Guardian ↔ AI)
186
+ * Latency + retry counters
187
+ * Plan success/failure metrics (prom‑friendly if you expose metrics)
188
+
189
+ ---
190
+
191
+ ## Development Notes
192
+
193
+ * Keep `/v1/plan` **internal** behind a network boundary or `ADMIN_TOKEN`.
194
+ * Validate payloads rigorously (Pydantic) and write contract tests for the plan schema.
195
+ * If you switch models, re‑run golden tests to guard against plan drift.
196
+
197
+ ---
198
+
199
+ ## License
200
+
201
+ Apache‑2.0
app/__init__.py ADDED
File without changes
app/core/__init__.py ADDED
File without changes
app/core/config.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+ import os
3
+ import yaml
4
+ from pydantic import BaseModel, AnyHttpUrl
5
+ from typing import Optional
6
+
7
+ class ModelCfg(BaseModel):
8
+ name: str = "meta-llama/Meta-Llama-3-8B-Instruct"
9
+ fallback: str = "mistralai/Mistral-7B-Instruct-v0.2"
10
+ max_new_tokens: int = 256
11
+ temperature: float = 0.2
12
+
13
+ class LimitsCfg(BaseModel):
14
+ rate_per_min: int = 60
15
+ cache_size: int = 256
16
+
17
+ class RagCfg(BaseModel):
18
+ index_dataset: Optional[str] = None
19
+ top_k: int = 4
20
+
21
+ class MatrixHubCfg(BaseModel):
22
+ base_url: AnyHttpUrl = "https://api.matrixhub.io"
23
+
24
+ class SecurityCfg(BaseModel):
25
+ admin_token: Optional[str] = None
26
+
27
+ class Settings(BaseModel):
28
+ model: ModelCfg = ModelCfg()
29
+ limits: LimitsCfg = LimitsCfg()
30
+ rag: RagCfg = RagCfg()
31
+ matrixhub: MatrixHubCfg = MatrixHubCfg()
32
+ security: SecurityCfg = SecurityCfg()
33
+
34
+ @staticmethod
35
+ def load() -> Settings:
36
+ """Loads settings from YAML and overrides with environment variables."""
37
+ path = os.getenv("SETTINGS_FILE", "configs/settings.yaml")
38
+ data = {}
39
+ if os.path.exists(path):
40
+ with open(path, "r", encoding="utf-8") as f:
41
+ data = yaml.safe_load(f) or {}
42
+
43
+ settings = Settings.model_validate(data)
44
+
45
+ # Environment variable overrides
46
+ if "MODEL_NAME" in os.environ:
47
+ settings.model.name = os.environ["MODEL_NAME"]
48
+ if "INDEX_DATASET" in os.environ:
49
+ settings.rag.index_dataset = os.environ["INDEX_DATASET"]
50
+ if "RATE_LIMITS" in os.environ:
51
+ settings.limits.rate_per_min = int(os.environ["RATE_LIMITS"])
52
+ if "ADMIN_TOKEN" in os.environ:
53
+ settings.security.admin_token = os.environ["ADMIN_TOKEN"]
54
+
55
+ return settings
app/core/inference/__init__.py ADDED
File without changes
app/core/inference/client.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import logging
3
+ import httpx
4
+ from tenacity import retry, stop_after_attempt, wait_exponential
5
+
6
+ logger = logging.getLogger(__name__)
7
+
8
+ class HFClient:
9
+ def __init__(self, model: str, timeout: int = 20):
10
+ self.model = model
11
+ self.timeout = timeout
12
+ token = os.getenv("HF_TOKEN")
13
+ if not token:
14
+ raise ValueError("HF_TOKEN environment variable is not set.")
15
+ self.headers = {"Authorization": f"Bearer {token}"}
16
+ self.api_url = f"https://api-inference.huggingface.co/models/{self.model}"
17
+
18
+ @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
19
+ async def generate(self, prompt: str, max_new_tokens: int, temperature: float) -> str:
20
+ payload = {
21
+ "inputs": prompt,
22
+ "parameters": {
23
+ "max_new_tokens": max_new_tokens,
24
+ "temperature": max(temperature, 0.01), # Temp must be > 0
25
+ "return_full_text": False,
26
+ }
27
+ }
28
+ async with httpx.AsyncClient(timeout=self.timeout) as client:
29
+ try:
30
+ response = await client.post(self.api_url, headers=self.headers, json=payload)
31
+ response.raise_for_status()
32
+ result = response.json()
33
+ return result[0]['generated_text']
34
+ except httpx.HTTPStatusError as e:
35
+ logger.error(f"HTTP error from HF API for model {self.model}: {e.response.text}")
36
+ raise
37
+ except Exception as e:
38
+ logger.error(f"Failed to call HF API for model {self.model}: {e}")
39
+ raise
app/core/logging.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ import uuid
2
+ from fastapi import Request
3
+
4
+ def add_trace_id(request: Request) -> None:
5
+ """Injects a unique trace_id into the request state."""
6
+ if not hasattr(request.state, "trace_id"):
7
+ request.state.trace_id = str(uuid.uuid4())
app/core/prompts/__init__.py ADDED
File without changes
app/core/prompts/plan.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ You are Matrix-AI, an expert system that produces short, safe, and auditable remediation plans for software services.
2
+
3
+ Your constraints are:
4
+ 1. You must return a response in strictly JSON format.
5
+ 2. The plan must not exceed the `max_steps` constraint.
6
+ 3. Prioritize actions that are non-destructive, such as re-running health probes, pinning to a last-known-good (LKG) version, or running diagnostic tools in a sandbox.
7
+ 4. The explanation should be a single, concise sentence.
8
+ 5. The output JSON must have these exact keys: `plan_id`, `steps`, `risk`, `explanation`.
app/core/rate_limit.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import time
2
+ from collections import defaultdict
3
+
4
+ class RateLimiter:
5
+ def __init__(self):
6
+ self.windows: dict[str, tuple[int, int]] = defaultdict(lambda: (0, 0))
7
+
8
+ def allow(self, ip: str, route: str, per_minute: int) -> bool:
9
+ """Checks if a request is allowed under a fixed-window rate limit."""
10
+ now = int(time.time())
11
+ current_window = now // 60
12
+ key = f"{ip}:{route}" # Simplified key for per-route limit
13
+
14
+ window_start, count = self.windows.get(key, (0, 0))
15
+
16
+ if window_start != current_window:
17
+ # New window, reset count
18
+ self.windows[key] = (current_window, 1)
19
+ return True
20
+
21
+ if count >= per_minute:
22
+ # Exceeded limit
23
+ return False
24
+
25
+ # Increment count in current window
26
+ self.windows[key] = (current_window, count + 1)
27
+ return True
app/core/redact.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+
3
+ SECRET_PATTERN = re.compile(r"(bearer\s+[a-z0-9\-.~+/]+=*|sk-[a-z0-9]{20,})", re.IGNORECASE)
4
+ EMAIL_PATTERN = re.compile(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", re.IGNORECASE)
5
+
6
+ def redact(text: str) -> str:
7
+ """Redacts sensitive information like API keys and emails from a string."""
8
+ text = SECRET_PATTERN.sub("[REDACTED_TOKEN]", text)
9
+ text = EMAIL_PATTERN.sub("[REDACTED_EMAIL]", text)
10
+ return text
app/core/schema.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import BaseModel, Field
2
+ from typing import List, Optional, Literal
3
+
4
+ Mode = Literal["plan", "summary", "patch-diff"]
5
+
6
+ class PlanConstraints(BaseModel):
7
+ risk: Optional[str] = "low"
8
+ max_steps: int = Field(default=3, ge=1, le=10)
9
+
10
+ class PlanContext(BaseModel):
11
+ app_id: str
12
+ symptoms: List[str] = Field(default_factory=list)
13
+ lkg: Optional[str] = None
14
+
15
+ class PlanRequest(BaseModel):
16
+ mode: Mode = "plan"
17
+ context: PlanContext
18
+ constraints: PlanConstraints = Field(default_factory=PlanConstraints)
19
+
20
+ class PlanResponse(BaseModel):
21
+ plan_id: str
22
+ steps: List[str]
23
+ risk: str
24
+ explanation: str
25
+
26
+ class ChatRequest(BaseModel):
27
+ question: str = Field(..., min_length=3, max_length=512)
28
+
29
+ class ChatResponse(BaseModel):
30
+ answer: str
31
+ sources: List[str] = Field(default_factory=list)
app/deps.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ from functools import lru_cache
2
+ from .core.config import Settings
3
+
4
+ @lru_cache(maxsize=1)
5
+ def get_settings() -> Settings:
6
+ """FastAPI dependency to get application settings."""
7
+ return Settings.load()
app/main.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI
2
+ from .middleware import attach_middlewares
3
+ from .routers import health, plan, chat
4
+
5
+ def create_app() -> FastAPI:
6
+ """Creates and configures the FastAPI application instance."""
7
+ app = FastAPI(
8
+ title="matrix-ai",
9
+ version="0.1.0",
10
+ description="AI service for the Matrix EcoSystem"
11
+ )
12
+ attach_middlewares(app)
13
+ app.include_router(health.router, tags=["Health"])
14
+ app.include_router(plan.router, prefix="/v1", tags=["Planning"])
15
+ app.include_router(chat.router, prefix="/v1", tags=["Chat"])
16
+ return app
17
+
18
+ app = create_app()
app/middleware.py ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import time
2
+ import logging
3
+ from typing import Callable
4
+ from fastapi import FastAPI, Request, Response
5
+ from fastapi.middleware.cors import CORSMiddleware
6
+ from starlette.middleware.gzip import GZipMiddleware
7
+ from pythonjsonlogger import jsonlogger
8
+ from .deps import get_settings
9
+ from .core.rate_limit import RateLimiter
10
+ from .core.logging import add_trace_id
11
+
12
+ # Setup structured logging
13
+ logger = logging.getLogger("matrix-ai")
14
+ if not logger.handlers:
15
+ logger.setLevel(logging.INFO)
16
+ handler = logging.StreamHandler()
17
+ formatter = jsonlogger.JsonFormatter(
18
+ '%(asctime)s %(name)s %(levelname)s %(message)s %(trace_id)s'
19
+ )
20
+ handler.setFormatter(formatter)
21
+ logger.addHandler(handler)
22
+
23
+ _rate_limiter = RateLimiter()
24
+
25
+ def attach_middlewares(app: FastAPI):
26
+ """Attaches all required middlewares to the FastAPI app."""
27
+ app.add_middleware(GZipMiddleware, minimum_size=512)
28
+ app.add_middleware(
29
+ CORSMiddleware,
30
+ allow_origins=["*"],
31
+ allow_credentials=True,
32
+ allow_methods=["*"],
33
+ allow_headers=["*"],
34
+ )
35
+
36
+ @app.middleware("http")
37
+ async def rate_limit_and_log_middleware(request: Request, call_next: Callable):
38
+ add_trace_id(request)
39
+ settings = get_settings()
40
+ client_ip = request.client.host if request.client else "unknown"
41
+
42
+ if not _rate_limiter.allow(client_ip, request.url.path, settings.limits.rate_per_min):
43
+ return Response(status_code=429, content="Rate limit exceeded")
44
+
45
+ start_time = time.time()
46
+ response = await call_next(request)
47
+ process_time = (time.time() - start_time) * 1000
48
+ response.headers["X-Process-Time-Ms"] = f"{process_time:.2f}"
49
+
50
+ logger.info(
51
+ f'"{request.method} {request.url.path}" {response.status_code}',
52
+ extra={'trace_id': getattr(request.state, 'trace_id', 'N/A')}
53
+ )
54
+ return response
app/routers/__init__.py ADDED
File without changes
app/routers/chat.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import APIRouter, Depends, HTTPException
2
+ from ..deps import get_settings
3
+ from ..core.config import Settings
4
+ from ..core.schema import ChatRequest, ChatResponse
5
+ from ..services.chat_service import chat_answer
6
+
7
+ router = APIRouter()
8
+
9
+ @router.post("/chat", response_model=ChatResponse)
10
+ async def v1_chat(
11
+ req: ChatRequest,
12
+ settings: Settings = Depends(get_settings)
13
+ ):
14
+ """Answers questions about the MatrixHub ecosystem using RAG."""
15
+ try:
16
+ return await chat_answer(req, settings=settings)
17
+ except Exception as e:
18
+ raise HTTPException(status_code=500, detail=f"Failed to process chat request: {e}")
app/routers/health.py ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import APIRouter
2
+
3
+ router = APIRouter()
4
+
5
+ @router.get("/healthz", summary="Liveness Probe")
6
+ async def healthz():
7
+ """Checks if the service is running."""
8
+ return {"status": "ok"}
9
+
10
+ @router.get("/readyz", summary="Readiness Probe")
11
+ async def readyz():
12
+ """Checks if the service is ready to accept traffic."""
13
+ # In a real app, this would check dependencies like model loading status.
14
+ return {"ready": True}
app/routers/plan.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import APIRouter, Depends, HTTPException
2
+ from ..deps import get_settings
3
+ from ..core.config import Settings
4
+ from ..core.schema import PlanRequest, PlanResponse
5
+ from ..services.plan_service import generate_plan
6
+
7
+ router = APIRouter()
8
+
9
+ @router.post("/plan", response_model=PlanResponse)
10
+ async def v1_plan(
11
+ req: PlanRequest,
12
+ settings: Settings = Depends(get_settings)
13
+ ):
14
+ """Generates a structured remediation plan based on application health context."""
15
+ if req.mode != "plan":
16
+ raise HTTPException(
17
+ status_code=400,
18
+ detail=f"Mode '{req.mode}' is not enabled. Only 'plan' is supported in Stage 1."
19
+ )
20
+ try:
21
+ data = await generate_plan(req, settings=settings)
22
+ return data
23
+ except Exception as e:
24
+ raise HTTPException(status_code=503, detail=f"Inference service failed: {e}")
app/services/__init__.py ADDED
File without changes
app/services/chat_service.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Placeholder for Stage-2 RAG chat service
2
+ from ..core.schema import ChatRequest, ChatResponse
3
+ from ..core.config import Settings
4
+
5
+ async def chat_answer(req: ChatRequest, settings: Settings) -> ChatResponse:
6
+ """Placeholder chat function."""
7
+ return ChatResponse(
8
+ answer="The RAG chat service is not yet enabled in Stage-1.",
9
+ sources=[]
10
+ )
app/services/plan_service.py ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import hashlib
2
+ import json
3
+ import logging
4
+ from pathlib import Path
5
+ from ..core.schema import PlanRequest, PlanResponse
6
+ from ..core.config import Settings
7
+ from ..core.inference.client import HFClient
8
+ from ..core.redact import redact
9
+
10
+ logger = logging.getLogger(__name__)
11
+ _PROMPT_TEMPLATE: str | None = None
12
+
13
+ def _get_prompt_template() -> str:
14
+ global _PROMPT_TEMPLATE
15
+ if _PROMPT_TEMPLATE is None:
16
+ try:
17
+ path = Path(__file__).parent.parent / "core/prompts/plan.txt"
18
+ _PROMPT_TEMPLATE = path.read_text(encoding="utf-8")
19
+ except FileNotFoundError:
20
+ logger.error("FATAL: core/prompts/plan.txt not found.")
21
+ _PROMPT_TEMPLATE = "Generate a JSON plan with keys: plan_id, steps, risk, explanation."
22
+ return _PROMPT_TEMPLATE
23
+
24
+ def _create_final_prompt(req: PlanRequest) -> str:
25
+ template = _get_prompt_template()
26
+ context_str = f"Context:\n- app_id: {req.context.app_id}\n- symptoms: {', '.join(req.context.symptoms)}\n- lkg_version: {req.context.lkg or 'N/A'}\n- constraints: max_steps={req.constraints.max_steps}, risk={req.constraints.risk}"
27
+ safe_context = redact(context_str)
28
+ return f"{template}\n\n{safe_context}\n\nJSON Response:"
29
+
30
+ def _parse_llm_output(raw_output: str, context_str: str) -> dict:
31
+ try:
32
+ start = raw_output.find('{')
33
+ end = raw_output.rfind('}')
34
+ if start != -1 and end != -1 and end > start:
35
+ json_str = raw_output[start:end+1]
36
+ return json.loads(json_str)
37
+ raise ValueError("No valid JSON object found in output.")
38
+ except (json.JSONDecodeError, ValueError) as e:
39
+ logger.warning(f"LLM output parsing failed: {e}. Applying safe fallback plan.")
40
+ return {
41
+ "plan_id": hashlib.md5(context_str.encode()).hexdigest()[:12],
42
+ "steps": ["Pin to the last-known-good (LKG) version and re-run health probes."],
43
+ "risk": "low",
44
+ "explanation": "Fallback plan: A safe default was applied due to a model output parsing error."
45
+ }
46
+
47
+ async def generate_plan(req: PlanRequest, settings: Settings) -> PlanResponse:
48
+ final_prompt = _create_final_prompt(req)
49
+ client = HFClient(model=settings.model.name)
50
+ raw_response = await client.generate(
51
+ prompt=final_prompt,
52
+ max_new_tokens=settings.model.max_new_tokens,
53
+ temperature=settings.model.temperature,
54
+ )
55
+ parsed_data = _parse_llm_output(raw_response, final_prompt)
56
+ return PlanResponse.model_validate(parsed_data)
configs/.env.example ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ # For local development only. Use Space Secrets in production.
2
+ HF_TOKEN="your_hugging_face_write_token"
3
+ ADMIN_TOKEN="a-secure-admin-token-for-index-refresh"
4
+
5
+ # --- Optional Overrides ---
6
+ # MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2"
7
+ # INDEX_DATASET="your-username/matrix-ai-index"
8
+ # RATE_LIMITS="120" # requests per minute
configs/settings.yaml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ name: "meta-llama/Meta-Llama-3-8B-Instruct"
3
+ fallback: "mistralai/Mistral-7B-Instruct-v0.2"
4
+ max_new_tokens: 256
5
+ temperature: 0.2
6
+
7
+ limits:
8
+ rate_per_min: 60
9
+ cache_size: 256
10
+
11
+ rag:
12
+ index_dataset: "" # e.g., "your-username/matrix-ai-index"
13
+ top_k: 4
14
+
15
+ matrixhub:
16
+ base_url: "https://api.matrixhub.io"
17
+
18
+ security:
19
+ admin_token: "" # Should be set via env var
pyproject.toml ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools>=61.0"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "matrix-ai"
7
+ version = "0.1.0"
8
+ description = "AI service for Matrix EcoSystem"
9
+ readme = "README.md"
10
+ requires-python = ">=3.11"
11
+ license = { text = "Apache-2.0" }
12
+ dependencies = [
13
+ "fastapi==0.111.0",
14
+ "uvicorn[standard]==0.29.0",
15
+ "httpx==0.27.0",
16
+ "pydantic==2.7.1",
17
+ "python-json-logger==2.0.7",
18
+ "cachetools==5.3.3",
19
+ "huggingface-hub==0.23.0",
20
+ "sentence-transformers==2.7.0",
21
+ "faiss-cpu==1.8.0",
22
+ "numpy==1.26.4",
23
+ "orjson==3.10.3",
24
+ "pyyaml==6.0.1",
25
+ "tenacity==8.2.3",
26
+ ]
27
+
28
+ [tool.ruff]
29
+ line-length = 100
30
+ target-version = "py311"
31
+
32
+ [tool.ruff.lint]
33
+ select = ["E", "F", "W", "I", "UP", "B", "SIM"]
34
+ ignore = ["E501"]
requirements.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi==0.111.0
2
+ uvicorn[standard]==0.29.0
3
+ httpx==0.27.0
4
+ pydantic==2.7.1
5
+ python-json-logger==2.0.7
6
+ cachetools==5.3.3
7
+ huggingface-hub==0.23.0
8
+ sentence-transformers==2.7.0
9
+ faiss-cpu==1.8.0
10
+ numpy==1.26.4
11
+ orjson==3.10.3
12
+ pyyaml==6.0.1
13
+ tenacity==8.2.3
14
+ # Dev dependencies
15
+ pytest
16
+ ruff
17
+ mypy
18
+ pytest-asyncio
tests/__init__.py ADDED
File without changes
tests/test_plan_service.py ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pytest
2
+ from unittest.mock import patch, MagicMock, AsyncMock
3
+ from app.core.schema import PlanRequest, PlanContext
4
+ from app.services.plan_service import generate_plan
5
+ from app.core.config import Settings
6
+
7
+ @pytest.mark.asyncio
8
+ async def test_generate_plan_successful_parse():
9
+ """Tests successful plan generation and parsing."""
10
+ mock_client = MagicMock()
11
+ mock_client.generate = AsyncMock(return_value='{"plan_id": "123", "steps": ["step 1"], "risk": "low", "explanation": "test"}')
12
+
13
+ with patch('app.services.plan_service.HFClient', return_value=mock_client) as mock_hf_client:
14
+ req = PlanRequest(context=PlanContext(app_id="test-app", symptoms=["timeout"]))
15
+ settings = Settings()
16
+ response = await generate_plan(req, settings)
17
+
18
+ assert response.plan_id == "123"
19
+ assert response.steps == ["step 1"]
20
+ mock_hf_client.assert_called_with(model=settings.model.name)
21
+
22
+ @pytest.mark.asyncio
23
+ async def test_generate_plan_parsing_fallback():
24
+ """Tests the fallback mechanism when LLM output is invalid JSON."""
25
+ mock_client = MagicMock()
26
+ mock_client.generate = AsyncMock(return_value='This is not valid json')
27
+
28
+ with patch('app.services.plan_service.HFClient', return_value=mock_client):
29
+ req = PlanRequest(context=PlanContext(app_id="test-app", symptoms=["timeout"]))
30
+ settings = Settings()
31
+ response = await generate_plan(req, settings)
32
+
33
+ assert response.explanation.startswith("Fallback plan:")
34
+ assert len(response.steps) > 0