Instructions to use ConeML/coneml-348m-alpha-polish900 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ConeML/coneml-348m-alpha-polish900 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ConeML/coneml-348m-alpha-polish900") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ConeML/coneml-348m-alpha-polish900") model = AutoModelForCausalLM.from_pretrained("ConeML/coneml-348m-alpha-polish900") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ConeML/coneml-348m-alpha-polish900 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ConeML/coneml-348m-alpha-polish900" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ConeML/coneml-348m-alpha-polish900", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ConeML/coneml-348m-alpha-polish900
- SGLang
How to use ConeML/coneml-348m-alpha-polish900 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ConeML/coneml-348m-alpha-polish900" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ConeML/coneml-348m-alpha-polish900", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ConeML/coneml-348m-alpha-polish900" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ConeML/coneml-348m-alpha-polish900", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ConeML/coneml-348m-alpha-polish900 with Docker Model Runner:
docker model run hf.co/ConeML/coneml-348m-alpha-polish900
ConeML 348M Alpha Polish900
ConeML 348M Alpha Polish900 is a 348M-parameter scratch-trained alpha model from a custom layered curriculum, followed by staged SFT activation. This release is a research artifact and alpha candidate, not a polished general assistant.
The main result is format activation: raw-completion probes understated the base checkpoint, while the trained chat format activated transitive binding and simple code-body generation. Arithmetic remains unresolved.
Why ConeML Exists
ConeML is an independent research effort testing whether compact language models can be built from scratch through deliberately staged curricula rather than scale alone.
The central question is whether small models can develop usable reasoning substrate through corpus design, curriculum order, and staged activation training. In v10, the clearest signal was transitive relation binding for name-like entities. In raw base probes this was near-absent; after focused SFT it reached 100% on a fixed-template internal chat probe (depths 1-3, N=128 per depth). A held-out probe (2026-06-23, N=128 per depth, depths 1-5) confirms this generalizes across new names and new relation wording: with held-out names and a new older/younger relation, chat first-choice accuracy was 79% / 89% / 88% / 77% / 71% across depths 1-5, well above chance. Generalization is weaker under unseen query phrasing (56% / 73% / 59% / 48% / 34%) and falls to roughly chance for non-name entities such as colored cards (51% / 50% / 41% / 31% / 28% vs chance 50% / 33% / 25% / 20% / 17%). The result is real and held-out, but the binding is name-shaped and surface-sensitive, not general abstract transitive reasoning.
coneml-348m-alpha-polish900 is the first public artifact from that work. Its strongest result is not that every capability is solved, but that raw completion understated parts of the model: transitive reasoning and simple code-body behavior became much more visible after targeted SFT, while arithmetic remained a real unresolved weakness.
Intended Format
Use the role-marker chat format:
User:
<instruction>
Assistant:
Raw completion is not the intended use surface for the tuned checkpoint.
License
Released for non-commercial use under CC BY-NC 4.0. Commercial use is not granted by this release.
Loading
This is a text-only causal language model. Use AutoModelForCausalLM or LlamaForCausalLM, not a multimodal model class.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "ConeML/coneml-348m-alpha-polish900"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype=torch.float32,
device_map="auto",
)
prompt = "User:\nWhat is 2 + 3? Return only the number.\nAssistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=32,
do_sample=False,
eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Architecture
- Family: Llama-style decoder
- Parameters: approximately 348M
- Layers: 30
- Hidden size: 1024
- Attention heads: 8
- KV heads: 2
- Vocab size: 32768
- Context length: 512
- RoPE theta: 1000000
- Tokenizer: custom 32K tokenizer
Training Lineage
- Base selected near the balanced pretrain region:
ckpt_0210000.pt - SFT stages: base210 -> SFT300 -> focus600 -> polish900
- Final exported checkpoint:
runs/v10_348m_cone_sft_polish900/sft_ckpt_0000300.pt
Internal Probe Results
These are internal diagnostic probes, not public benchmark claims.
Activation Progression
Raw-completion transitive first-name accuracy improved gradually across SFT stages:
| Stage | Depth 1 | Depth 2 | Depth 3 |
|---|---|---|---|
| Base210 raw | 32.03% | 24.22% | 5.47% |
| SFT300 raw | 32.81% | 35.94% | 12.50% |
| Focus600 raw | 46.88% | 61.72% | 43.75% |
| Polish900 raw | 53.91% | 64.84% | 53.12% |
Chat-format transitive binding reached 100% on the fixed-template Focus600 and Polish900 internal probes (depths 1-3, in-distribution name pool). A separate held-out probe (2026-06-23) confirms generalization to new names and a new relation wording (79-89% at depths 1-3) but shows near-chance performance on non-name entities; see Held-out Transitive Validation below.
| Stage | Depth 1 | Depth 2 | Depth 3 | Math final numeric | Code strict exec |
|---|---|---|---|---|---|
| Focus600 chat | 100.00% | 100.00% | 100.00% | 28.12% | 0.00% |
| Polish900 chat | 100.00% | 100.00% | 100.00% | 35.94% | 16.67% |
Chat-Format Probe
| Probe | Result |
|---|---|
| Transitive depth 1 first-name | 100.00% |
| Transitive depth 2 first-name | 100.00% |
| Transitive depth 3 first-name | 100.00% |
| Math final numeric | 35.94% |
| Math answer-anywhere | 35.94% |
| Code strict exec | 16.67% |
Code note: strict execution is mostly limited by indentation. The Polish900 chat probe generated plausible correct return expressions for the 6 simple function probes, but 5/6 were emitted with the wrong leading whitespace; under indentation normalization, those return bodies execute.
Raw-Completion Probe
| Probe | Result |
|---|---|
| Transitive depth 1 first-name | 53.91% |
| Transitive depth 2 first-name | 64.84% |
| Transitive depth 3 first-name | 53.12% |
| Math final numeric | 17.97% |
| Code body rate | 0.00% |
Held-out Transitive Validation (2026-06-23)
Polish900, chat surface, first-choice accuracy, N=128 per depth.
| Suite | D1 | D2 | D3 | D4 | D5 |
|---|---|---|---|---|---|
| SFT template + held-out names | 94.5% | 96.1% | 94.5% | 94.5% | 82.8% |
| Held-out names + new relation (older/younger) | 78.9% | 89.1% | 88.3% | 76.6% | 71.1% |
| Unseen query phrasing + held-out names | 56.3% | 73.4% | 59.4% | 48.4% | 33.6% |
| Non-name entities (cards, comes before) | 50.8% | 50.0% | 41.4% | 30.5% | 28.1% |
| Chance | 50% | 33% | 25% | 20% | 17% |
Takeaway: held-out validation supports generalization across new names and relation wording for name-like entities in chat format. It is weaker under unseen query phrasing and at or near chance for non-name entities and deeper chains. Raw completion is weaker than chat in every suite.
Strengths
- Scratch-trained 348M model from a custom layered curriculum.
- Strong SFT activation curve on transitive relation binding.
- Chat-format transitive relation binding reaches 100% on a fixed-template internal probe (depths 1-3) and is held-out validated for name-like entities (79-89% at depths 1-3 with new names and a new relation wording). It degrades under unseen query phrasing and drops to roughly chance for non-name entities, so it is name-shaped binding rather than general transitive reasoning.
- Simple code return bodies appear in chat format; the remaining failure is mostly indentation/formatting, not missing return-body content on the internal probe.
Known Limitations
- Arithmetic remains the weakest major capability lane. Chat-format final numeric accuracy reached 35.94% on the internal probe, but reliable multi-digit arithmetic is not solved.
- Raw completion is poor for code bodies and is not the intended tuned interface.
- Code indentation is unstable without postprocessing.
- Internal probes only; this card makes no public benchmark claims.
- This is an alpha/research release, not a replacement for larger general assistants.
Reproducibility Artifacts
This local release directory includes:
training_summary.jsonevals/v10_diagnostic_probe_0210000_gpu_matched_sft300.jsonevals/diag_postsft_sft300_gpu.jsonevals/diag_focus600_raw_gpu.jsonevals/chat_activation_focus600_gpu.jsonevals/chat_activation_polish900_gpu.jsonevals/diag_polish900_raw_gpu.json
- Downloads last month
- 68