Apply curated official model card (SOTA 0.5B) with full project context.

Browse files

Files changed (1) hide show

README.md +223 -388

README.md CHANGED Viewed

@@ -1,490 +1,325 @@
 ---
 library_name: transformers
 pipeline_tag: text-generation
 tags:
 - mathematics
-- conjecture
 - reasoning
-- peft
 - lora
 - lean
-base_model: Qwen/Qwen2.5-0.5B-Instruct
-base_model_relation: adapter
 ---
-# NorthernTribe-Research/math-conjecture-model
-## Parameter Count Visualization
-| Metric | Value |
-| --- | --- |
-| Total parameters | **502.831M** (`502,830,976`) |
-| Trainable parameters | **8.798M** (`8,798,208`) |
-| Frozen parameters | **494.033M** (`494,032,768`) |
-| Trainable ratio | **1.7497%** |
-`[#---------------------------]` trainable share
-## Training Reference
-- Summary source: `workspace/runs/math-conjecture-sota-050b-quick/training_summary.json`
-- This card is generated from the repository README plus the latest training summary.
-## Project Documentation
-<details>
-<summary>Expand full project README context</summary>
-This repository builds a merged dataset for training math AI systems aimed at
-**unsolved conjecture reasoning**. The v1 pipeline combines local conjecture
-data with curated open-license Hugging Face datasets (competition, structured
-reasoning, and formal proof corpora).
-It now also reaches into broader open math sources such as OpenR1-Math-220k,
-FineProofs-SFT, LeanStatement_CoT, NuminaMath-LEAN, and DeepSeek-Prover-V1 to
-better cover proof traces, theorem formalization, and reasoning-heavy
-competition data.
-## Repository Layout
-```text
-configs/
-  source_registry.yaml
-data/
-  raw/
-    unsolved_conjectures.jsonl
-  processed/
-    train.jsonl
-    validation.jsonl
-    test.jsonl
-    manifest.json
-  interim/
-    discovery.json
-    pull_report.json
-    normalized_rows.jsonl
-    merged_train.jsonl
-    merged_validation.jsonl
-    merged_test.jsonl
-    normalize_stats.json
-    merge_stats.json
-    validation_report.json
-  releases/
-    v1/
-      train.parquet
-      validation.parquet
-      test.parquet
-      manifest.json
-      excluded_sources.json
-      dataset_card.md
-      size_report.json
-      push_report.json
-schemas/
-  conjecture_record.schema.json
-  training_example.schema.json
-  normalized_training_row.schema.json
-scripts/
-  build_dataset.py
-  validate_dataset.py
-  pipeline.py
-  manage_hf_bucket.py
-  release_and_space_run.py
-model_development/
-  configs/
-    math_conjecture_sota.yaml
-    math_conjecture_sota_state_of_art.yaml
-    math_conjecture_sft.yaml
-    math_conjecture_scratch.yaml
-    math_conjecture_scratch_smoke.yaml
-    qwen25_math_sota.yaml
-  scripts/
-    train_sft.py
-    train_sota.py
-    train_scratch.py
-    eval_sota.py
-    generate_rft_data.py
-    merge_and_push.py
-  requirements.txt
-  README.md
-space_trainer/
-  app.py
-  configs/
-    math_conjecture_sota.yaml
-    math_conjecture_sota_state_of_art.yaml
-    qwen25_math_sota.yaml
-  scripts/
-    train_sota.py
-    eval_sota.py
-  requirements.txt
-  README.md
-space_conjecture_lab/
-  app.py
-  requirements.txt
-  README.md
-docs/
-  math_conjecture_lean_ai_rollout_runbook.md
-  model_sota_strategy_2026-03-23.md
-  state_of_art_math_blueprint_2026-03-25.md
-```
-## Existing Local Dataset Build
-```bash
-python scripts/build_dataset.py \
-  --seed-path data/raw/unsolved_conjectures.jsonl \
-  --output-dir data/processed \
-  --split-seed 17
-python scripts/validate_dataset.py \
-  --seed-path data/raw/unsolved_conjectures.jsonl \
-  --processed-dir data/processed
-```
-## Merged Corpus Pipeline (v1)
-Use the project virtualenv:
-```bash
-.venv/bin/python scripts/pipeline.py discover
-.venv/bin/python scripts/pipeline.py pull
-.venv/bin/python scripts/pipeline.py normalize
-.venv/bin/python scripts/pipeline.py merge
-.venv/bin/python scripts/pipeline.py validate
-.venv/bin/python scripts/pipeline.py pack
-.venv/bin/python scripts/pipeline.py push
-```
-Default publish target:
-- HF account: from env (`HF_USERNAME`/`HF_NAMESPACE`) or `huggingface-api-key.json`
-- dataset repo: `HF_DATASET_REPO_ID` or fallback `<username>/math-conjecture-training-corpus`
-- visibility: public dataset repo
-The registry keeps these broader sources capped and split-aware so the corpus
-can grow materially without assuming unrealistic storage or training budgets.
-License policy checks now evaluate both dataset card fields (`license` and
-`license_name`), handle list-valued license metadata, and still block unresolved
-custom/unknown licenses.
-## Hugging Face Buckets
-Current Hugging Face Hub docs expose storage buckets through both the CLI and
-Python API.
-CLI commands documented today:
-- `hf buckets create BUCKET_ID [--private] [--exist-ok]`
-- `hf buckets info BUCKET_ID`
-- `hf buckets list [NAMESPACE|BUCKET_ID]`
-- `hf buckets delete BUCKET_ID`
-- `hf buckets move FROM_ID TO_ID`
-- `hf buckets remove ...`
-- `hf buckets sync ...`
-- `hf buckets cp ...`
-Python bucket helpers are documented as available since `huggingface_hub`
-`v1.5.0`, including `create_bucket`, `list_bucket_tree`, and `sync_bucket`.
-This checkout currently has `huggingface_hub 1.4.1`, so the bucket CLI/API is
-not available here yet. The helper script below detects that mismatch and
-prints a clear upgrade/permission error instead of failing silently.
-Examples:
-```bash
-# Check whether a bucket exists in the current namespace
-python scripts/manage_hf_bucket.py status my-bucket --namespace NorthernTribe-Research
-# Create a private bucket
-python scripts/manage_hf_bucket.py create my-bucket --namespace NorthernTribe-Research --private
-# Ensure a bucket exists, creating it if needed
-python scripts/manage_hf_bucket.py ensure my-bucket --namespace NorthernTribe-Research --private
-# Upstream CLI, when using a newer huggingface_hub install
-hf buckets create NorthernTribe-Research/my-bucket --private
-hf buckets list NorthernTribe-Research
-hf buckets info NorthernTribe-Research/my-bucket
-```
-If you want the bucket commands in the CLI on this machine, upgrade the Hub
-package to a version at or above `huggingface_hub>=1.5.0`.
-## hf-mount Setup (Standard Workflow)
-Use `scripts/hf_mount_setup.sh` as the project-standard entrypoint for
-installing and operating `hf-mount`.
-One-time bootstrap (installs binaries, persists PATH, writes runtime env):
-```bash
-scripts/hf_mount_setup.sh bootstrap --persist-path
-```
-This writes helper defaults to:
-- `workspace/runtime/hf_mount.env`
-In new shells, load project defaults:
-```bash
-source workspace/runtime/hf_mount.env
-```
-`hf_mount_setup.sh` now writes and maintains these Hugging Face project defaults:
-- `HF_NAMESPACE`
-- `HF_DATASET_REPO_ID`
-- `HF_MODEL_REPO_ID`
-- `HF_TRAINER_SPACE_REPO_ID`
-- `HF_LAB_SPACE_REPO_ID`
-- `HF_OPS_BUCKET_ID`
-Common operations:
-```bash
-# Mount the dataset repo (read-only)
-scripts/hf_mount_setup.sh mount-repo \
-  --repo "datasets/${HF_DATASET_REPO_ID}" \
-  --target workspace/hf_mounts/dataset
-# Mount the model repo (read-only)
-scripts/hf_mount_setup.sh mount-repo \
-  --repo "${HF_MODEL_REPO_ID}" \
-  --target workspace/hf_mounts/model
-# Mount a Space repo (read-only)
-scripts/hf_mount_setup.sh mount-repo \
-  --repo "spaces/${HF_TRAINER_SPACE_REPO_ID}" \
-  --target workspace/hf_mounts/space_trainer
-# Mount a bucket (read-write by default)
-scripts/hf_mount_setup.sh mount-bucket \
-  --bucket "${HF_OPS_BUCKET_ID}" \
-  --target workspace/hf_mounts/math_conjecture_ops
-# Inspect and stop mounts
-scripts/hf_mount_setup.sh status
-scripts/hf_mount_setup.sh stop --target workspace/hf_mounts/dataset
-```
-If your host requires privileged NFS/FUSE mounts, add `--sudo` to mount/stop
-commands.
-## Push-And-Run Orchestrator
-Use `scripts/release_and_space_run.py` to execute the full promotion flow for:
-- dataset repo: `HF_DATASET_REPO_ID` (default: `NorthernTribe-Research/math-conjecture-training-corpus`)
-- space repo: `HF_TRAINER_SPACE_REPO_ID` (default: `NorthernTribe-Research/math_trainer`)
-- model repo: `HF_MODEL_REPO_ID` (default: `NorthernTribe-Research/math-conjecture-model`)
-- ops bucket: `HF_OPS_BUCKET_ID` (default: `NorthernTribe-Research/math-conjecture-ops`)
-Install/upgrade tooling in the virtualenv:
-```bash
-.venv/bin/python -m pip install -r model_development/requirements.txt
-```
-Run the rollout sequence:
-```bash
-.venv/bin/python scripts/release_and_space_run.py prepare
-.venv/bin/python scripts/release_and_space_run.py bucket
-.venv/bin/python scripts/release_and_space_run.py publish-dataset
-.venv/bin/python scripts/release_and_space_run.py deploy-space
-.venv/bin/python scripts/release_and_space_run.py run-space
-.venv/bin/python scripts/release_and_space_run.py verify
-```
-`run-space` now retries transient client/runtime failures by default. Tune with
-`--max-retries` and `--retry-sleep-seconds`.
-Non-destructive Space API probe (preflight only):
-```bash
-.venv/bin/python scripts/release_and_space_run.py run-space \
-  --preflight-only \
-  --no-push-to-hub \
-  --no-run-eval \
-  --max-stages 1 \
-  --allow-failed-result
-```
-Optional safety pins for dataset publish/verify:
-```bash
-.venv/bin/python scripts/release_and_space_run.py publish-dataset \
-  --expected-created-at 2026-03-23T11:03:54+00:00 \
-  --expected-total-rows 473349
-```
-`verify` is strict by default and expects a successful `run-space` report plus
-model artifacts. For baseline infrastructure checks without a completed run:
-```bash
-.venv/bin/python scripts/release_and_space_run.py verify \
-  --no-require-space-run-success \
-  --no-require-model-artifacts
-```
-Generated reports are written to `data/releases/v1/`:
-- `promotion_prepare_report.json`
-- `promotion_bucket_report.json`
-- `promotion_dataset_publish_report.json`
-- `promotion_space_deploy_report.json`
-- `promotion_space_run_report.json`
-- `promotion_verify_report.json`
-## Model Development (Lean + SOTA Math Profiles)
-The model fine-tuning workspace is under `model_development/`.
-The SOTA curriculum now profiles responses across simple, intermediate,
-advanced, and Lean-formalized bands.
-```bash
-.venv/bin/python -m pip install -r model_development/requirements.txt
-.venv/bin/python model_development/scripts/train_sft.py \
-  --config model_development/configs/math_conjecture_sft.yaml
-.venv/bin/python model_development/scripts/train_sft.py \
-  --config model_development/configs/math_conjecture_sft.yaml \
-  --max-train-samples 120000
-.venv/bin/python model_development/scripts/train_sota.py \
-  --config model_development/configs/math_conjecture_sota.yaml
-.venv/bin/python model_development/scripts/train_sota.py \
-  --config model_development/configs/qwen25_math_sota.yaml
-.venv/bin/python model_development/scripts/train_sota.py \
-  --config model_development/configs/math_conjecture_sota_state_of_art.yaml
-.venv/bin/python model_development/scripts/train_scratch.py \
-  --config model_development/configs/math_conjecture_scratch.yaml \
-  --init-only
-.venv/bin/python model_development/scripts/train_scratch.py \
-  --config model_development/configs/math_conjecture_scratch_smoke.yaml \
-  --dry-run
 ```
-The SOTA eval report now includes `difficulty_band_metrics`,
-`response_profile_metrics`, and `simple_to_lean`.
-The training summary now records model parameter stats under
-`model.parameter_counts` (total/trainable/frozen + ratio).
-When `push_to_hub` is enabled, `train_sota.py` now also updates model
-`README.md` on Hugging Face with a parameter-count visualization table and
-trainable-share bar.
-The eval flow also supports consensus/verifier-aware selection metrics such as
-`selected_pass_at_k`, `consensus_rate`, and `consensus_pass_at_k`.
-State-of-art eval now also supports stricter grading controls:
-`--allow-substring-match` (off by default) and optional SymPy symbolic checks
-(`symbolic_verifier_enabled` reported in eval output).
-Self-improvement (rejection-sampling) data generation:
-```bash
-.venv/bin/python model_development/scripts/generate_rft_data.py \
-  --config model_development/configs/math_conjecture_sota_state_of_art.yaml \
-  --adapter-path model_development/runs/math-conjecture-sota-state-of-art/final_adapter \
-  --input-file data/releases/v1/train.parquet \
-  --output-file model_development/runs/math-conjecture-rft/rft_train.parquet \
-  --k 8 \
-  --max-samples 2000
-```
-SOTA blueprint and implementation notes:
-- `docs/state_of_art_math_blueprint_2026-03-25.md`
-Optional adapter merge and model publish:
-```bash
-.venv/bin/python model_development/scripts/merge_and_push.py \
-  --adapter-path model_development/runs/math-conjecture-sota/final_adapter \
-  --output-dir model_development/merged/math-conjecture-model \
-  --push-to-hub \
-  --repo-id NorthernTribe-Research/math-conjecture-model
 ```
-The publish flow now auto-generates a Hugging Face model card `README.md` with
-parameter-count visualization (total/trainable/frozen + trainable-share bar),
-so model size is visible directly on the Hub page.
-### llama.cpp Inference (GGUF)
-`llama.cpp` inference is validated in this workspace for the trained
-math-conjecture adapter flow.
-Build `llama.cpp`:
-```bash
-git clone --depth 1 https://github.com/ggml-org/llama.cpp workspace/llama.cpp
-cmake -S workspace/llama.cpp -B workspace/llama.cpp/build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
-cmake --build workspace/llama.cpp/build -j
-```
-Merge adapter to full model, then convert to GGUF:
-```bash
-.venv/bin/python model_development/scripts/merge_and_push.py \
-  --adapter-path workspace/runs/math-conjecture-sota-050b-quick/final_adapter \
-  --output-dir workspace/runs/math-conjecture-sota-050b-quick/merged_model
-.venv/bin/python workspace/llama.cpp/convert_hf_to_gguf.py \
-  workspace/runs/math-conjecture-sota-050b-quick/merged_model \
-  --outfile workspace/runs/math-conjecture-sota-050b-quick/math-conjecture-sota-050b-f16.gguf \
-  --outtype f16
-```
-Optional quantization:
-```bash
-./workspace/llama.cpp/build/bin/llama-quantize \
-  workspace/runs/math-conjecture-sota-050b-quick/math-conjecture-sota-050b-f16.gguf \
-  workspace/runs/math-conjecture-sota-050b-quick/math-conjecture-sota-050b-q4_k_m.gguf \
-  Q4_K_M
-```
-Run one-shot inference via helper script:
-```bash
-scripts/llama_cpp_infer.sh \
-  --model workspace/runs/math-conjecture-sota-050b-quick/math-conjecture-sota-050b-q4_k_m.gguf \
-  --prompt "2+2=" \
-  --n-predict 8
-```
-You can also save output:
-```bash
-scripts/llama_cpp_infer.sh \
-  --model workspace/runs/math-conjecture-sota-050b-quick/math-conjecture-sota-050b-f16.gguf \
-  --prompt "Solve: a+b=10 and a-b=4. Return JSON with keys a and b only." \
-  --n-predict 64 \
-  --out workspace/runs/math-conjecture-sota-050b-quick/llama_cpp_inference.json.txt
-```
-## Hugging Face Space Trainer
-`space_trainer/` contains a GPU Space app that runs staged training and pushes
-artifacts to `NorthernTribe-Research/math-conjecture-model`.
-## Conjecture Showcase Space
-`space_conjecture_lab/` contains a separate Lean+AI conjecture analysis Space
-that:
-- loads evidence from `NorthernTribe-Research/math-conjecture-training-corpus`,
-- attempts conjecture analysis with `NorthernTribe-Research/math-conjecture-model`,
-- falls back to a base model when needed,
-- always displays explicit work traces, prompt text, and a Lean stub in UI.
-## Policy
-- Include only open, known-license, non-gated datasets from the registry.
-- Exclude unknown-license and gated datasets in v1.
-- Prefer capped, high-signal subsets for very large sources so training stays
-  practical while coverage expands.
-- Keep release compact with deterministic filtering, de-duplication, and
-  split assignment by hashed prompt.
-</details>

 ---
+language:
+- en
+license: apache-2.0
 library_name: transformers
 pipeline_tag: text-generation
 tags:
 - mathematics
+- conjectures
+- theorem-proving
 - reasoning
+- qlora
 - lora
+- peft
+- formal-math
 - lean
+- research
+base_model: "Qwen/Qwen2.5-0.5B-Instruct"
+datasets:
+- NorthernTribe-Research/math-conjecture-training-corpus
+model-index:
+- name: math-conjecture-model
+  results: []
 ---
+# Math Conjecture SOTA 0.5B
+Math Conjecture SOTA 0.5B is a research-focused language model adapted for **mathematical reasoning**, **conjecture analysis**, **proof-style generation**, and **formalization-aware responses**. It is part of the broader **Math Conjecture Training Corpus** effort, which builds open-license training pipelines for unsolved conjectures, structured reasoning, competition mathematics, proof traces, and Lean-oriented theorem workflows.
+This checkpoint is intended for research and experimentation around long-form mathematical reasoning rather than proof certification.
+---
+## Model Details
+### Model description
+This model is fine-tuned to produce more structured mathematical outputs, including:
+- intuition-first explanations
+- stepwise proof sketches
+- conjecture decomposition
+- theorem-style reasoning
+- informal-to-formal transition hints
+- Lean-aware reasoning patterns
+The surrounding project supports multiple development paths including supervised fine-tuning, state-of-the-art math profiles, scratch initialization, evaluation, self-improvement data generation, adapter merge, Hub publishing, and local `llama.cpp` inference after conversion.
+### Model type
+Parameter-efficient fine-tuned causal language model for math reasoning.
+### Hub repo
+- **Model repo:** `NorthernTribe-Research/math-conjecture-model`
+- **Dataset repo:** `NorthernTribe-Research/math-conjecture-training-corpus`
+- **Trainer Space repo:** `NorthernTribe-Research/math_trainer`
+---
+## Parameter Count Visualization
+| Metric | Value |
+|---|---:|
+| Total parameters | 502.831M (502,830,976) |
+| Trainable parameters | 8.798M (8,798,208) |
+| Frozen parameters | 494.033M (494,032,768) |
+| Trainable ratio | 1.7497% |
+**Trainable share:**
+`[#---------------------------] 1.7497%`
+This checkpoint uses a parameter-efficient adaptation setup in which only a small fraction of model weights are trainable while the vast majority remain frozen. The project documentation explicitly states that training summaries record `model.parameter_counts` with total, trainable, frozen, and ratio fields, and that the Hub README is auto-updated with this visualization when `push_to_hub` is enabled.
+---
+## Training Reference
+- **Summary source:** `workspace/runs/math-conjecture-sota-050b-quick/training_summary.json`
+- This model card is derived from the repository README together with the latest training summary. The repo also includes a helper for refreshing the model card directly from those sources: `scripts/update_model_card.py`.
+---
+## Intended Uses
+### Direct use
+This model is suitable for:
+- mathematical reasoning research
+- conjecture exploration demos
+- proof-sketch generation
+- theorem-style answer generation
+- reasoning benchmark experiments
+- formalization-oriented prompting
+- research prototypes on Hugging Face Spaces
+### Downstream use
+Potential downstream uses include:
+- math-focused copilots
+- conjecture analysis interfaces
+- educational proof assistants
+- formal/informal bridge systems
+- evaluation pipelines for reasoning-heavy LLMs
+### Out-of-scope use
+This model is **not** intended to be treated as:
+- a formal theorem prover
+- a replacement for proof assistants
+- a certified symbolic solver
+- a source of guaranteed-correct proofs
+- an authoritative system for high-stakes mathematical claims
+---
+## Training Data
+This model is part of a larger corpus-building effort designed for training math AI systems aimed at **unsolved conjecture reasoning**. The v1 dataset pipeline combines local conjecture data with curated open-license Hugging Face datasets covering competition mathematics, structured reasoning, and formal proof corpora. The repository also documents broader open math sources such as:
+- OpenR1-Math-220k
+- FineProofs-SFT
+- LeanStatement_CoT
+- NuminaMath-LEAN
+- DeepSeek-Prover-V1
+These sources are included to improve coverage across proof traces, theorem formalization, and reasoning-heavy mathematical examples. The project documentation also emphasizes open-license, practical-size, and deterministic filtering choices in the corpus pipeline.
+---
+## Training Procedure
+The model-development workspace supports:
+- SFT training
+- SOTA math profile training
+- scratch initialization
+- scratch dry-runs
+- self-improvement / rejection-sampling data generation
+- evaluation with richer reasoning metrics
+- adapter merge and Hub publish flows
+The SOTA curriculum profiles responses across:
+- simple
+- intermediate
+- advanced
+- Lean-formalized bands
+### Evaluation capabilities
+The project documentation states that the SOTA evaluation flow includes:
+- `difficulty_band_metrics`
+- `response_profile_metrics`
+- `simple_to_lean`
+- `selected_pass_at_k`
+- `consensus_rate`
+- `consensus_pass_at_k`
+It also supports stricter grading controls such as optional SymPy symbolic verification and substring-match controls.
+### Self-improvement flow
+The repository includes a rejection-sampling data-generation path via `generate_rft_data.py`, used to create self-improvement training data from model outputs.
+---
+## Project Architecture
+The wider project includes:
+- merged dataset construction
+- validation and release packaging
+- model development configs and scripts
+- a Space trainer app
+- a conjecture-lab Space app
+- rollout runbooks
+- state-of-the-art math blueprints
+This means the model should be understood as one artifact within a broader research pipeline rather than a standalone checkpoint.
+---
+## Example Usage
+### Transformers
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+repo_id = "NorthernTribe-Research/math-conjecture-model"
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+model = AutoModelForCausalLM.from_pretrained(
+    repo_id,
+    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+    device_map="auto"
+)
+prompt = """Analyze the following mathematical conjecture.
+Conjecture:
+If a sequence is eventually periodic, then its generating function is rational.
+Return:
+1. Intuition
+2. Proof sketch
+3. Key assumptions
+4. Formalization notes
+"""
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=512,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.9
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
+### Prompting style
+This model generally benefits from prompts that request structure explicitly, such as:
+- intuition
+- proof sketch
+- formalization notes
+- assumptions
+- edge cases
+- candidate counterexamples
+- Lean-style outline
+### Example prompt
+```text
+Consider the statement:
+"If a + b is even, then a and b have the same parity."
+Provide:
+1. An intuitive explanation
+2. A proof
+3. A compact formalization outline
+4. Any assumptions or edge cases
 ```
+---
+## Local Inference
+The project documentation confirms a local inference path using `llama.cpp` after:
+1. merging the adapter into a full model
+2. converting the merged model to GGUF
+3. optionally quantizing for lighter deployment
+This supports local experimentation outside standard Transformers-based serving.
+---
+## Limitations
+This is a research model and may still:
+- generate invalid proofs that sound convincing
+- confuse symbolic plausibility with correctness
+- fail on deep multi-hop theorem reasoning
+- produce incomplete or brittle formalization hints
+- overuse familiar proof templates
+- hallucinate mathematical structure
+All substantive outputs should be independently verified with formal tools, symbolic systems, or expert review.
+---
+## Risks and Recommendations
+Because the model is optimized for proof-style text generation, users may over-trust fluent mathematical output. For that reason:
+- verify results independently
+- use proof assistants or symbolic systems when correctness matters
+- treat outputs as research assistance, not certification
+- benchmark behavior before deployment in educational or analytical systems
+---
+## Project Documentation
+The repository includes documentation and workflow support for:
+- dataset release
+- model development
+- rollout automation
+- Space deployment
+- evaluation
+- adapter merge and publish
+- model-card refresh automation
+---
+## Citation
+```bibtex
+@misc{northerntribe_math_conjecture_sota_05b_2026,
+  title        = {Math Conjecture SOTA 0.5B},
+  author       = {NorthernTribe Research},
+  year         = {2026},
+  publisher    = {Hugging Face},
+  howpublished = {Model repository}
+}
+```
+---
+## Disclaimer
+This model is intended for research, experimentation, and educational exploration in mathematical reasoning. It does not guarantee theorem validity, proof correctness, or novel mathematical discovery.