Instructions to use groxaxo/Qwento-Agentic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use groxaxo/Qwento-Agentic with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="groxaxo/Qwento-Agentic") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("groxaxo/Qwento-Agentic") model = AutoModelForMultimodalLM.from_pretrained("groxaxo/Qwento-Agentic") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use groxaxo/Qwento-Agentic with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "groxaxo/Qwento-Agentic" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "groxaxo/Qwento-Agentic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/groxaxo/Qwento-Agentic
- SGLang
How to use groxaxo/Qwento-Agentic with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "groxaxo/Qwento-Agentic" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "groxaxo/Qwento-Agentic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "groxaxo/Qwento-Agentic" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "groxaxo/Qwento-Agentic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use groxaxo/Qwento-Agentic with Docker Model Runner:
docker model run hf.co/groxaxo/Qwento-Agentic
Qwento-Agentic
⚠️ Test run. This is an early experimental checkpoint, not a finished model.
A QLoRA fine-tune merged into BF16, built on top of
Qwen/Qwen-AgentWorld-35B-A3B
(a Qwen3.5 MoE: 35B total / ~3B active, hybrid DeltaNet linear-attention + full-attention,
256 experts). It was trained on a curated set of publicly available datasets and is
designed for coding tasks.
What this is
- Type: test run — a single short curriculum stage (2K sequence length), early checkpoint.
- Method: QLoRA (rank 16, α 32) applied to the model's sequence-mixing path
(full-attention
q/k/v/o+ linear-attention input/output projections across all 40 layers), then merged into the BF16 base weights. The 256 MoE experts were left frozen. - Format: BF16 safetensors, drop-in with 🤗 Transformers / vLLM (same architecture and tokenizer as the base).
Training data (curated, publicly available)
A token-balanced blend of cleaned public datasets:
| Source | Focus |
|---|---|
Jackrong/Claude-opus-4.7-TraceInversion-5000x |
reasoning / trace-inversion problem solving |
lordx64/reasoning-distill-claude-opus-4-7-max |
high-quality reasoning traces |
lordx64/reasoning-distill-opus-4-7-max-sft |
instruction-style reasoning SFT |
Infatoshi/kernelbench-mega-traces |
GPU-kernel coding traces |
Glint-Research/fable-5-traces |
multi-turn agentic coding (tool use) |
All sources were structurally cleaned and quality-filtered before mixing.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained(
"groxaxo/Qwento-Agentic", torch_dtype="bfloat16", device_map="auto", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("groxaxo/Qwento-Agentic")
Limitations
This is a preliminary test checkpoint from a short training run; it has not been
benchmarked and should be treated as experimental. It inherits the license and any usage
restrictions of the base model (Qwen/Qwen-AgentWorld-35B-A3B).
- Downloads last month
- 34