Instructions to use ConeML/coneml-348m-alpha-polish900 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ConeML/coneml-348m-alpha-polish900 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ConeML/coneml-348m-alpha-polish900")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ConeML/coneml-348m-alpha-polish900")
model = AutoModelForCausalLM.from_pretrained("ConeML/coneml-348m-alpha-polish900")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ConeML/coneml-348m-alpha-polish900 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ConeML/coneml-348m-alpha-polish900"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConeML/coneml-348m-alpha-polish900",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ConeML/coneml-348m-alpha-polish900

SGLang

How to use ConeML/coneml-348m-alpha-polish900 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ConeML/coneml-348m-alpha-polish900" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConeML/coneml-348m-alpha-polish900",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ConeML/coneml-348m-alpha-polish900" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConeML/coneml-348m-alpha-polish900",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ConeML/coneml-348m-alpha-polish900 with Docker Model Runner:
```
docker model run hf.co/ConeML/coneml-348m-alpha-polish900
```

ConeML 348M Alpha Polish900

ConeML 348M Alpha Polish900 is a 348M-parameter scratch-trained alpha model from a custom layered curriculum, followed by staged SFT activation. This release is a research artifact and alpha candidate, not a polished general assistant.

The main result is format activation: raw-completion probes understated the base checkpoint, while the trained chat format activated transitive binding and simple code-body generation. Arithmetic remains unresolved.

Why ConeML Exists

ConeML is an independent research effort testing whether compact language models can be built from scratch through deliberately staged curricula rather than scale alone.

The central question is whether small models can develop usable reasoning substrate through corpus design, curriculum order, and staged activation training. In v10, the clearest signal was transitive relation binding for name-like entities. In raw base probes this was near-absent; after focused SFT it reached 100% on a fixed-template internal chat probe (depths 1-3, N=128 per depth). A held-out probe (2026-06-23, N=128 per depth, depths 1-5) confirms this generalizes across new names and new relation wording: with held-out names and a new older/younger relation, chat first-choice accuracy was 79% / 89% / 88% / 77% / 71% across depths 1-5, well above chance. Generalization is weaker under unseen query phrasing (56% / 73% / 59% / 48% / 34%) and falls to roughly chance for non-name entities such as colored cards (51% / 50% / 41% / 31% / 28% vs chance 50% / 33% / 25% / 20% / 17%). The result is real and held-out, but the binding is name-shaped and surface-sensitive, not general abstract transitive reasoning.

coneml-348m-alpha-polish900 is the first public artifact from that work. Its strongest result is not that every capability is solved, but that raw completion understated parts of the model: transitive reasoning and simple code-body behavior became much more visible after targeted SFT, while arithmetic remained a real unresolved weakness.

Intended Format

Use the role-marker chat format:

User:
<instruction>
Assistant:

Raw completion is not the intended use surface for the tuned checkpoint.

License

Released for non-commercial use under CC BY-NC 4.0. Commercial use is not granted by this release.

Loading

This is a text-only causal language model. Use AutoModelForCausalLM or LlamaForCausalLM, not a multimodal model class.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "ConeML/coneml-348m-alpha-polish900"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.float32,
    device_map="auto",
)

prompt = "User:\nWhat is 2 + 3? Return only the number.\nAssistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=32,
    do_sample=False,
    eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Architecture

Family: Llama-style decoder
Parameters: approximately 348M
Layers: 30
Hidden size: 1024
Attention heads: 8
KV heads: 2
Vocab size: 32768
Context length: 512
RoPE theta: 1000000
Tokenizer: custom 32K tokenizer

Training Lineage

Base selected near the balanced pretrain region: ckpt_0210000.pt
SFT stages: base210 -> SFT300 -> focus600 -> polish900
Final exported checkpoint: runs/v10_348m_cone_sft_polish900/sft_ckpt_0000300.pt

Internal Probe Results

These are internal diagnostic probes, not public benchmark claims.

Activation Progression

Raw-completion transitive first-name accuracy improved gradually across SFT stages:

Stage	Depth 1	Depth 2	Depth 3
Base210 raw	32.03%	24.22%	5.47%
SFT300 raw	32.81%	35.94%	12.50%
Focus600 raw	46.88%	61.72%	43.75%
Polish900 raw	53.91%	64.84%	53.12%

Chat-format transitive binding reached 100% on the fixed-template Focus600 and Polish900 internal probes (depths 1-3, in-distribution name pool). A separate held-out probe (2026-06-23) confirms generalization to new names and a new relation wording (79-89% at depths 1-3) but shows near-chance performance on non-name entities; see Held-out Transitive Validation below.

Stage	Depth 1	Depth 2	Depth 3	Math final numeric	Code strict exec
Focus600 chat	100.00%	100.00%	100.00%	28.12%	0.00%
Polish900 chat	100.00%	100.00%	100.00%	35.94%	16.67%

Chat-Format Probe

Probe	Result
Transitive depth 1 first-name	100.00%
Transitive depth 2 first-name	100.00%
Transitive depth 3 first-name	100.00%
Math final numeric	35.94%
Math answer-anywhere	35.94%
Code strict exec	16.67%

Code note: strict execution is mostly limited by indentation. The Polish900 chat probe generated plausible correct return expressions for the 6 simple function probes, but 5/6 were emitted with the wrong leading whitespace; under indentation normalization, those return bodies execute.

Raw-Completion Probe

Probe	Result
Transitive depth 1 first-name	53.91%
Transitive depth 2 first-name	64.84%
Transitive depth 3 first-name	53.12%
Math final numeric	17.97%
Code body rate	0.00%

Held-out Transitive Validation (2026-06-23)

Polish900, chat surface, first-choice accuracy, N=128 per depth.

Suite	D1	D2	D3	D4	D5
SFT template + held-out names	94.5%	96.1%	94.5%	94.5%	82.8%
Held-out names + new relation (older/younger)	78.9%	89.1%	88.3%	76.6%	71.1%
Unseen query phrasing + held-out names	56.3%	73.4%	59.4%	48.4%	33.6%
Non-name entities (cards, comes before)	50.8%	50.0%	41.4%	30.5%	28.1%
Chance	50%	33%	25%	20%	17%

Takeaway: held-out validation supports generalization across new names and relation wording for name-like entities in chat format. It is weaker under unseen query phrasing and at or near chance for non-name entities and deeper chains. Raw completion is weaker than chat in every suite.

Strengths

Scratch-trained 348M model from a custom layered curriculum.
Strong SFT activation curve on transitive relation binding.
Chat-format transitive relation binding reaches 100% on a fixed-template internal probe (depths 1-3) and is held-out validated for name-like entities (79-89% at depths 1-3 with new names and a new relation wording). It degrades under unseen query phrasing and drops to roughly chance for non-name entities, so it is name-shaped binding rather than general transitive reasoning.
Simple code return bodies appear in chat format; the remaining failure is mostly indentation/formatting, not missing return-body content on the internal probe.

Known Limitations

Arithmetic remains the weakest major capability lane. Chat-format final numeric accuracy reached 35.94% on the internal probe, but reliable multi-digit arithmetic is not solved.
Raw completion is poor for code bodies and is not the intended tuned interface.
Code indentation is unstable without postprocessing.
Internal probes only; this card makes no public benchmark claims.
This is an alpha/research release, not a replacement for larger general assistants.

Reproducibility Artifacts

This local release directory includes:

training_summary.json
evals/v10_diagnostic_probe_0210000_gpu_matched_sft300.json
evals/diag_postsft_sft300_gpu.json
evals/diag_focus600_raw_gpu.json
evals/chat_activation_focus600_gpu.json
evals/chat_activation_polish900_gpu.json
evals/diag_polish900_raw_gpu.json

Downloads last month: 68

Safetensors

Model size

0.3B params

Tensor type

F32