Instructions to use my-ai-stack/Stack-X-Ultimate with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-X-Ultimate with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-X-Ultimate")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-X-Ultimate")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-X-Ultimate")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-X-Ultimate with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-X-Ultimate"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-X-Ultimate",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-X-Ultimate

SGLang

How to use my-ai-stack/Stack-X-Ultimate with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-X-Ultimate" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-X-Ultimate",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-X-Ultimate" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-X-Ultimate",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-X-Ultimate with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-X-Ultimate
```

Welly-code commited on 17 days ago

Commit

0589a8e

verified ·

1 Parent(s): fb176f8

Delete README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +0 -173

README.md DELETED Viewed

@@ -1,173 +0,0 @@
----
-base_model: Qwen/Qwen2.5-Coder-3B-Instruct
-datasets:
-  - nvidia/Nemotron-Agentic-v1
-  - my-ai-stack/Stack-4.0-Dataset
-pipeline_tag: text-generation
-license: apache-2.0
-tags:
-  - code-generation
-  - agentic-ai
-  - tool-use
-  - lora
-  - qwen
-  - python
-  - coding-assistant
-  - transformers
-  - peft
-  - 3b-parameter-model
-model_index:
-  - name: Stack X Ultimate
-    results:
-      - task:
-          type: text-generation
-          description: Agentic coding assistant with tool-use capabilities
-        dataset:
-          type: openai/openai_humaneval
-          name: HumanEval
-        metrics:
-          - type: pass@1
-            value: TBD
----
-# Stack X Ultimate
-**A state-of-the-art agentic coding model built on Qwen2.5-Coder-3B-Instruct**
-Stack X is a LoRA adapter trained on a curated mix of real agentic conversations, designed to make open-weight models better at multi-step tool use, code generation, and complex reasoning tasks.
----
-## Model Details
-- **Base Model:** Qwen/Qwen2.5-Coder-3B-Instruct
-- **Architecture:** Transformer (3B parameters)
-- **Training Type:** QLoRA (LoRA rank 32, 7 modules targeted)
-- **Trained by:** Walid Sobhie via OpenClaw agentic pipeline
-- **Framework:** Hugging Face Transformers + PEFT + PyTorch bf16
-- **Training Hardware:** NVIDIA V100-SXM2-16GB (GCP spot instance)
-- **Training Steps:** 3,000 steps (curriculum sorted, cosine LR decay)
-- **Effective Batch Size:** 16 (gradient accumulation)
-- **Max Context:** 1,536 tokens
----
-## Training Data
-| Source | Description | Count |
-|--------|-------------|-------|
-| NVIDIA Nemotron Agentic | Real multi-step tool calling conversations | ~7,000 |
-| Stack-4.0 Smart | High-complexity agentic tasks | ~10,000 |
-| Stack-4.0 Tools | Diverse tool-use patterns | ~10,000 |
-| **Total (deduped)** | **After deduplication** | **~6,100** |
-Training data was filtered, deduplicated, and sorted by complexity (curriculum learning) before training.
----
-## Capabilities
-Stack X is designed to excel at:
-- **Multi-step tool use** — chains multiple tool calls with proper reasoning
-- **Code generation** — Python, JavaScript, shell, and more
-- **Debugging** — finds and explains bugs with fixes
-- **Math & reasoning** — step-by-step calculation and problem solving
-- **Research tasks** — information retrieval and synthesis
----
-## Usage
-### With PEFT (recommended — preserves base model)
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-from peft import PeftModel
-BASE = "Qwen/Qwen2.5-Coder-3B-Instruct"
-ADAPTER = "my-ai-stack/Stack-X-Ultimate"
-tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
-tokenizer.pad_token = tokenizer.eos_token
-base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype="bfloat16", device_map="auto")
-model = PeftModel.from_pretrained(base, ADAPTER)
-# Chat
-messages = [{"role": "user", "content": "Use the calculate tool to find sqrt(144)"}]
-text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(text, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=256)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```
-### Merged (full model)
-```python
-# See: my-ai-stack/Stack-X-Ultimate-Merged
-from transformers import AutoTokenizer, AutoModelForCausalLM
-model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-X-Ultimate-Merged", torch_dtype="bfloat16", device_map="auto")
-tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-X-Ultimate-Merged")
-```
----
-## Performance
-| Benchmark | Score |
-|-----------|-------|
-| HumanEval (0-shot) | TBD |
-| Agentic tool call | TBD |
-| Reasoning (commonsense) | TBD |
-*Evaluation results will be posted after training completes.*
----
-## Limitations
-- LoRA adapter requires compatible base model (Qwen2.5-Coder-3B-Instruct)
-- Max context 1,536 tokens — not suitable for very long documents
-- Trained primarily in English — other language performance may vary
-- Tool use limited to the patterns seen in training data
----
-## Training Recipe
-```
-Base model:        Qwen/Qwen2.5-Coder-3B-Instruct
-LoRA rank:         32 (59M trainable params)
-LoRA alpha:        64
-Target modules:    q_proj, k_proj, v_proj, o_proj,
-                   gate_proj, up_proj, down_proj
-Learning rate:     2e-4 (cosine decay)
-Warmup:            150 steps
-Batch size:        1 × gradient_accumulation=16
-Optimizer:         AdamW (bf16)
-Max grad norm:     0.5
-Weight decay:      0.1
-Mixed precision:   bf16
-Gradient checkpointing: enabled
-```
----
-## Citation
-```bibtex
-@misc{stackx2026,
-  title={Stack X Ultimate},
-  author={Walid Sobhie},
-  year={2026},
-  url={https://huggingface.co/my-ai-stack/Stack-X-Ultimate}
-}
-```
----
-## Disclaimer
-This model is provided as-is. Training was performed automatically via an OpenClaw agentic pipeline. Results may vary. Not reviewed for safety in production deployments.