Instructions to use oscowlai/Wiola13M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use oscowlai/Wiola13M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="oscowlai/Wiola13M")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("oscowlai/Wiola13M", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use oscowlai/Wiola13M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "oscowlai/Wiola13M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "oscowlai/Wiola13M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/oscowlai/Wiola13M
- SGLang
How to use oscowlai/Wiola13M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "oscowlai/Wiola13M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "oscowlai/Wiola13M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "oscowlai/Wiola13M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "oscowlai/Wiola13M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use oscowlai/Wiola13M with Docker Model Runner:
docker model run hf.co/oscowlai/Wiola13M
Wiola13M
Wiola13M is a 13 million parameter decoder-only Transformer developed by OSCOWL AI.
The model introduces a lightweight attention architecture based on:
- Spiral Rotary Position Embeddings (Spiral RoPE)
- Gated Spiral Attention
- Butterfly Feed Forward Network
- RMSNorm
- Weight-tied language modeling head
Wiola13M is designed as a compact research language model that can be trained efficiently on consumer GPUs while remaining fully compatible with the Hugging Face Transformers ecosystem.
Model Details
| Property | Value |
|---|---|
| Model Name | Wiola13M |
| Parameters | 12.9 Million |
| Architecture | Decoder-only Transformer |
| Hidden Size | 256 |
| Layers | 6 |
| Attention Heads | 8 |
| Context Length | 512 Tokens |
| Position Encoding | Spiral Rotary Embeddings |
| Feed Forward | Butterfly MLP |
| Framework | PyTorch |
| Library | Hugging Face Transformers |
Training
The model was trained on the TinyStories dataset.
Training configuration:
- Optimizer: AdamW
- Learning Rate: 3e-4
- Scheduler: Cosine
- Maximum Steps: 20,000
- Effective Batch Size: 32
- Mixed Precision Training
- Sequence Length: 512
Final Training Loss:
2.0568
Architecture
Wiola13M replaces standard Transformer attention with Gated Spiral Attention.
The architecture consists of:
Embedding
โ
Spiral Rotary Embedding
โ
Gated Multi-Head Attention
โ
Butterfly Feed Forward Network
โ
RMSNorm
โ
Language Modeling Head
Key innovations include:
- Content-adaptive attention gating
- Spiral positional encoding
- Efficient Butterfly MLP
- KV-cache compatible autoregressive decoding
Usage
Install the package:
pip install wiola13m
Load the model:
from transformers import AutoTokenizer
from wiola13m import WiolaForCausalLM
tokenizer = AutoTokenizer.from_pretrained("oscowlai/Wiola13M")
model = WiolaForCausalLM.from_pretrained("oscowlai/Wiola13M")
inputs = tokenizer(
"Once upon a time",
return_tensors="pt",
return_token_type_ids=False,
)
output = model.generate(
**inputs,
max_new_tokens=100,
do_sample=True,
temperature=0.8,
top_p=0.95,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Intended Uses
Wiola13M is intended for:
- Language model research
- Efficient transformer experimentation
- Education
- Architecture benchmarking
- Fine-tuning experiments
It is not intended for production deployment without further evaluation and fine-tuning.
Limitations
- Trained primarily on TinyStories.
- Not instruction tuned.
- Not RLHF aligned.
- May generate inaccurate or repetitive outputs.
- Performance outside the training domain has not been extensively evaluated.
Citation
@software{wiola13m2026,
title={Wiola 13M, a Gated Spiral Attention Architecture for Parameter Efficient Small Language Models},
author={OSCOWL AI},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/oscowlai/Wiola13M}
}
License
Apache License 2.0
Author
Developed by OSCOWL AI
GitHub: https://github.com/Wiola-OSCOWL-ai/Wiola13M
Hugging Face: https://huggingface.co/oscowlai/Wiola13M
- Downloads last month
- 146