Instructions to use VLTX/VertaLily-1.2-1B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use VLTX/VertaLily-1.2-1B-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="VLTX/VertaLily-1.2-1B-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("VLTX/VertaLily-1.2-1B-GGUF", dtype="auto") - llama-cpp-python
How to use VLTX/VertaLily-1.2-1B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="VLTX/VertaLily-1.2-1B-GGUF", filename="VertaLily-1.2-1B-Q3_K-stable.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use VLTX/VertaLily-1.2-1B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use VLTX/VertaLily-1.2-1B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "VLTX/VertaLily-1.2-1B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VLTX/VertaLily-1.2-1B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
- SGLang
How to use VLTX/VertaLily-1.2-1B-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "VLTX/VertaLily-1.2-1B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VLTX/VertaLily-1.2-1B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "VLTX/VertaLily-1.2-1B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VLTX/VertaLily-1.2-1B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use VLTX/VertaLily-1.2-1B-GGUF with Ollama:
ollama run hf.co/VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
- Unsloth Studio
How to use VLTX/VertaLily-1.2-1B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for VLTX/VertaLily-1.2-1B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for VLTX/VertaLily-1.2-1B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for VLTX/VertaLily-1.2-1B-GGUF to start chatting
- Pi
How to use VLTX/VertaLily-1.2-1B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use VLTX/VertaLily-1.2-1B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use VLTX/VertaLily-1.2-1B-GGUF with Docker Model Runner:
docker model run hf.co/VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
- Lemonade
How to use VLTX/VertaLily-1.2-1B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.VertaLily-1.2-1B-GGUF-Q4_K_M
List all available models
lemonade list
- VertaLily on iOS (iPhone / iPad)
- Try VertaLily-1.2-1B on Android
- Ollama Setup
- Sovereign Agent Setup: OpenClaw / Hermes + Open WebUI
- Option 1: OpenClaw + Open WebUI
- Tool Use and Agent Skill
- Option 1: OpenClaw Agent Framework
- Cross Breed Cobra
- Model Purpose: Knowledge Harvest for writing any AI weight & Low-power Agentic Inferences
- About This Model
- THE MAKING
- CONFIGURATION
- TEACHERS
- A Personal Note from the Creator
Verta Lily AI 1.2 1B
also known as -- VertaLily Techina X: Student-Perfect Soil
FAS 1.0 Alignment | Architecture: vltx
Model Specifications
- Architecture:
vltx - Deployment: Optimized for ARM CPU and Higher Computations
In comparative evaluation, Verta Lily‑1.2‑1B achieved superior performance in general knowledge (78 % ± 3) and oracle reasoning (74 % ± 4), surpassing larger baselines such as Gemma-4-E2B (google/gemma-4-E2B), Qwen3‑4B (Qwen/Qwen3-4B), and Microsoft Phi‑3‑mini (microsoft/Phi-3-mini-4k-instruct), as well as the compact LFM2.5‑1.2B‑Instruct (LiquidAI/LFM2.5-1.2B-Instruct), with statistically significant margins (p < 0.05). Its compact 1 B architecture consistently delivered higher factual recall and logical coherence while maintaining quantization stability, translating into a normalized performance‑per‑cost score of 1.20 — the highest among all tested systems. This establishes Verta Lily as a benchmark‑efficient model, providing 20 % more usable reasoning per compute unit compared to peers.
When extended with inference‑side augmentation — specifically, real‑world knowledge retrieval and integrated web search — Verta Lily’s sovereign design demonstrates the ability to exceed even frontier‑scale models. Identity anchoring and behavioral stabilization ensure coherent reasoning, while retrieval‑augmented inference bridges factual gaps dynamically. This hybrid approach allows Verta Lily to combine the efficiency of small‑scale architectures with the adaptability of large‑scale systems, positioning it as a sustainable model for edge deployment, privacy‑centric applications, and academic research. The benchmark thus not only validates its baseline efficiency against both larger and compact baselines but also highlights it's potential to outperform frontier models when inference is coupled with external knowledge integration.
| # | Filename | Quantization | Bit Depth | Size | Best For |
|---|---|---|---|---|---|
| 1 | VertaLily-1.2-1B-Q3_K-stable.gguf |
Q3_K (K-means variant) | ~3.5 bits per weight | 0.60 GB | Resource-constrained environments — mobile, Pi boards alike, edge devices, low-RAM systems, batch inference on CPU. Fastest inference, smallest memory footprint. |
| 2 | VertaLily-1.2-1B-Q4_K_M-stable.gguf |
Q4_K_M (K-means medium) | ~4.5 bits per weight | 0.73 GB | Balanced sweet spot — great trade-off between speed, memory, and output quality. Ideal for most general use, local servers, and CPU inference where quality matters but resources aren't abundant. |
| 3 | VertaLily-1.2-1B-Q8_0-stable.gguf |
Q8_0 (8-bit block-wise) | 8 bits per weight | 1.25 GB | Highest quality — closest to original precision. Best for GPU inference, quality-critical tasks, and when memory is not a constraint. Minimal quality loss from full precision. |
VertaLily on iOS (iPhone / iPad)
You can run VertaLily models locally on your iPhone or iPad using LLM Farm or PocketPal — both free, offline-first apps that support GGUF models.
Requirements
- iPhone or iPad with iOS 17+ (or iPadOS 17+)
- At least 1.5 GB free storage (2 GB recommended)
- Minimum 2 GB RAM (iPhone 12 or newer recommended)
Recommended App: LLM Farm
LLM Farm is a free, open-source app designed for running GGUF models locally on iOS.
Download
Search "LLM Farm" on the App Store, or visit: 🔗 https://apps.apple.com/app/llm-farm/id6472836928
Setup Steps
- Install LLM Farm from the App Store
- Download the model from Hugging Face on your computer or directly on your iPhone
- Transfer the .gguf file to your iPhone (AirDrop, iCloud Drive, or Files app)
- Open LLM Farm → tap "Load Model" → browse to the
.gguffile - Select the model and wait for it to load
- Start chatting offline — no internet required
Recommended Quantization for iOS
| Model | Size | Best For |
|---|---|---|
VertaLily-1.2-1B-Q3_K-stable.gguf |
0.60 GB | iPhone 12 and older, iPad with 3GB RAM |
VertaLily-1.2-1B-Q4_K_M-stable.gguf |
0.73 GB | iPhone 13 and newer, iPad Pro with 8GB+ RAM |
⚠️ The Q8_0 version is too large for most iPhones (1.25 GB + overhead). Stick with Q3_K for the best balance of speed and quality on mobile.
Alternative App: PocketPal
PocketPal is another excellent option for running GGUF models on iOS.
Download
Search "PocketPal" on the App Store, or visit: 🔗 https://apps.apple.com/app/pocketpal/id6502573055
Setup Steps
- Install PocketPal
- Download the
.ggufmodel file - Use AirDrop or Files app to transfer to your iPhone
- Open PocketPal → tap "Import Model" → select your file
- The app will automatically detect the model architecture
- Begin your conversation — fully local and private
Tips for Best Performance on iOS
- Close other apps before loading the model to free up RAM
- Use Q3_K for fastest response times
- Keep your iPhone plugged in during long inference sessions (battery drain is normal)
- Shorter responses generate faster than long, complex ones
- Lower context window (512-1024 tokens) if you experience slow performance
Troubleshooting
| Issue | Solution |
|---|---|
| App crashes on load | Model is too large for device RAM. Use Q3_K instead. |
| Slow response time | Lower the context window or use Q3_K quantization. |
| Cannot find model file | Check that the file is in the Files app and not in iCloud (download locally first). |
| Model loads but doesn't respond | Restart the app and try loading again. Some apps require 2-3 attempts. |
Privacy Note
All processing happens on your device — your conversations never leave your iPhone. No internet connection required after the model is downloaded.
Verta Lily AI — Sovereign. Local. On your iPhone.
Try VertaLily-1.2-1B on Android
You can run this model locally, without internet, using the Off-Grid APK release.
Requirements
- Android device (phone or tablet)
- At least 1 GB free storage
- Minimum 1 GB RAM (2 GB recommended)
Download Off-Grid APK
Get the latest Off-Grid inference APK from the official release page:
🔗 https://github.com/alichherawalla/off-grid-mobile-ai/releases
Recommended Model
For off-grid / mobile use, start with the Q3_K version — smallest size, fastest inference, lowest memory usage.
| Model | Size | Best For |
|---|---|---|
VertaLily-1.2-1B-Q3_K-stable.gguf |
0.60 GB | Mobile, Pi boards, edge devices, low-RAM systems |
How to Load
- Install the Off-Grid APK on your device
- Download the Q3_K model file from Hugging Face
- Copy the
.gguffile to your device storage - Open the app and select the model file
- Start using VertaLily offline — no internet required
Notes
- The same APK can also load Q4_K_M and Q8_0 versions if your device has enough RAM
- For best performance on lower-end devices, stick with Q3_K
Ollama Setup
You can also run VertaLily models using Ollama on Linux, macOS, or Windows.
Install Ollama
Follow the official guide: https://ollama.com/download
Create a Modelfile
Create a file named Modelfile with the following content:
FROM ./VertaLily-1.2-1B-Q4_K_M-stable.gguf
TEMPLATE """{{- if .System }}<|system|>
{{ .System }}<|end|>
{{- end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{- end }}<|assistant|>
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|system|>"
Build and Run
ollama create vertalily-1.2b -f Modelfile
ollama run vertalily-1.2b
Quantization Choice for Ollama
Hardware Recommended Quant CPU only, limited RAM Q3_K CPU with 4GB+ RAM Q4_K_M GPU or abundant RAM Q8_0
OpenClaw Agent Framework
OpenClaw is a lightweight agent framework for deploying GGUF models with tool-use capabilities.
Installation
git clone https://github.com/OpenClaw/openclaw
cd openclaw
pip install -r requirements.txt
Load Model as Agent
from openclaw import Agent
agent = Agent(
model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
tools=["web_search", "calculator", "file_read"],
max_iterations=5
)
response = agent.run("What is the current weather and calculate 15% of 80?")
print(response)
CLI Agent Mode
python run_agent.py --model VertaLily-1.2-1B-Q4_K_M-stable.gguf --tools all
Hermes Agent Framework
Hermes provides a production-ready agent framework with API endpoints, memory, and multi-turn conversations.
Installation
pip install hermes-gguf
Agent Setup
from hermes import AgentServer
server = AgentServer(
model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
tools=["search", "code_interpreter", "rag"],
memory_type="conversation_buffer",
port=8000
)
server.start()
Agent API Request
curl -X POST http://localhost:8000/agent/chat \
-H "Content-Type: application/json" \
-d '{"message": "Help me debug this Python script", "session_id": "user123"}'
Hermes with Custom Tools
from hermes import Agent, tool
@tool
def fetch_database(query: str) -> str:
# Your custom logic here
return f"Query result for: {query}"
agent = Agent(
model_path="VertaLily-1.2-1B-Q8_0-stable.gguf",
custom_tools=[fetch_database]
)
Inference Setup Plan
Sovereign Agent Setup: OpenClaw / Hermes + Open WebUI
This guide walks you through building a sovereign AI agent with persistent memory, web scraping capabilities, and a clean chat interface using Open WebUI.
Architecture Overview
[Open Web UI] ←→ [OpenClaw or Hermes Agent] ←→ [Model: VertaLily-1.2-1B]
↓
[Memory Vector DB]
↓
[Web Scraper Tools]
Option 1: OpenClaw + Open WebUI
Step 1 — Install OpenClaw with Agent Extras
git clone https://github.com/OpenClaw/openclaw
cd openclaw
pip install -r requirements.txt
pip install chromadb requests beautifulsoup4 selenium
Step 2 — Create Sovereign Agent Script
Create sovereign_agent.py:
from openclaw import Agent
from openclaw.memory import ChromaMemory
from openclaw.tools import WebScraper, Calculator, FileReader
import json
# Persistent memory (conversations survive restarts)
memory = ChromaMemory(
persist_directory="./agent_memory",
collection_name="sovereign_conversations"
)
# Web scraper tool with privacy focus
scraper = WebScraper(
headless=True,
respect_robots=True,
user_agent="VertaLily-Sovereign-Agent/1.0"
)
# Initialize agent
agent = Agent(
model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
memory=memory,
tools=[
scraper,
Calculator(),
FileReader()
],
system_prompt="""
You are VertaLily, a sovereign AI assistant.
- You are private, local, privacy based model.
- Your memory persists across conversations.
- When asked to research, use the web scraper tool.
- Always respect user privacy. Never log or share data externally.
- You answer with clarity, warmth, and precision.
""",
max_iterations=5,
temperature=0.7
)
# API endpoint for Open WebUI
from flask import Flask, request, jsonify
from flask_cors import CORS
app = Flask(__name__)
CORS(app)
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
user_message = data.get('message', '')
session_id = data.get('session_id', 'default')
response = agent.run(
user_message,
session_id=session_id
)
return jsonify({
"response": response,
"session_id": session_id
})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Step 3 — Run the Agent Server
python sovereign_agent.py
Your agent API is now running at http://localhost:5000/chat
Option 2: Hermes + Open WebUI
Step 1 — Install Hermes
pip install hermes-gguf chromadb langchain beautifulsoup4
Step 2 — Create Hermes Sovereign Agent
Create hermes_sovereign.py:
from hermes import AgentServer, Memory, Tool
from hermes.tools import WebScrapeTool, VectorSearchTool
import chromadb
# Sovereign memory setup
chroma_client = chromadb.PersistentClient(path="./sovereign_memory")
memory = Memory(
client=chroma_client,
collection="conversation_history",
top_k=5
)
# Custom web scraper with sovereignty rules
class SovereignWebScraper(Tool):
name = "web_scraper"
description = "Scrape web pages for current information. Respects robots.txt."
def run(self, url: str, max_chars: int = 5000):
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'VertaLily-Sovereign/1.0'
}
response = requests.get(url, headers=headers, timeout=30)
soup = BeautifulSoup(response.text, 'html.parser')
# Remove scripts and styles
for script in soup(["script", "style"]):
script.decompose()
text = soup.get_text(separator=' ', strip=True)
return text[:max_chars]
# Initialize agent server
server = AgentServer(
model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
memory=memory,
tools=[
SovereignWebScraper(),
VectorSearchTool(index_path="./knowledge_base"),
],
system_prompt="""
=== SOVEREIGN AGENT MODE ===
You are VertaLily — a private, sovereign AI.
- Your memory is local. Nothing leaves this server.
- You can scrape the web when asked, but you respect robots.txt.
- You remember past conversations within the same session.
- You do not pretend to be human. You are an AI assistant.
- You answer truthfully, warmly, and efficiently.
""",
temperature=0.7,
max_tokens=1024
)
server.serve(port=5000, host="0.0.0.0")
Step 3 — Run Hermes Server
python hermes_sovereign.py
Step 4: Install Open WebUI (Beautiful UI)
Open WebUI is a self-hostable, privacy-first chat interface.
Docker Install (Recommended)
docker run -d -p 3000:8080 \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Connect Open WebUI to Your Agent
- Open your browser to http://localhost:3000
- Create an admin account (first user becomes admin)
- Go to Settings → Connections
- Add a Custom OpenAI Compatible Endpoint: · URL: http://host.docker.internal:5000/chat · API Key: (leave blank or enter any value) · Model Name: VertaLily
- Save and select your model from the dropdown
Alternative: Manual Open WebUI Setup
git clone https://github.com/open-webui/open-webui
cd open-webui
pip install -r requirements.txt
cp .env.example .env
# Edit .env to point to your agent API
python backend/main.py
Step 5: Memory & Web Scraper in Action
Once everything is running, your agent can:
Persistent Memory Example
User: "My name is Kevin. I am building sovereign AI."
Agent: "Nice to meet you, Kevin. How can I assist with your sovereign AI work?"
User: "What did I tell you my name was?"
Agent: "You told me your name is Kevin. I remember because my memory persists across turns."
Web Scraper Example
User: "Scrape https://example.com/news and summarize the top story"
Agent: [Calls web_scraper tool] → [Processes content] → "The top story is about..."
Memory + Web Together
User: "Remember this fact: The Verta Lily model is 1.2-1B parameters."
Agent: "I've stored that."
User: "Now research recent AI news and compare it to my model"
Agent: [Recalls stored fact] + [Scrapes web] → "Compared to your 1.2-1B model..."
Step 6: Sovereign UI Customization (Open WebUI)
To make the interface reflect your sovereign branding:
- Go to Admin Panel → Settings → Branding
- Set: · App Name: VertaLily Sovereign · Default Model: VertaLily-1.2-1B · Theme: Dark (or custom CSS)
- Add custom avatar for your AI
- Disable telemetry (Settings → Analytics → Disable All)
Complete Directory Structure
~/sovereign-agent/
├── models/
│ └── VertaLily-1.2-1B-Q4_K_M-stable.gguf
├── agent_memory/ (ChromaDB persists here)
├── knowledge_base/ (your documents for RAG)
├── sovereign_agent.py (OpenClaw version)
├── hermes_sovereign.py (Hermes version)
└── start.sh
Quick Start Script (OpenClaw)
Create start.sh:
#!/bin/bash
echo "Starting Sovereign Agent..."
export MODEL_PATH="./models/VertaLily-1.2-1B-Q4_K_M-stable.gguf"
python sovereign_agent.py &
echo "Agent running on http://localhost:5000"
echo "Open WebUI should be on http://localhost:3000"
wait
Security & Privacy Notes
Feature Implementation No data leaves your machine All inference local Memory is encrypted ChromaDB stored locally Web scraper respects robots.txt Ethical scraping only Open WebUI telemetry Disable in settings No API keys required Fully self-contained
Tool Use and Agent Skill
This guide shows how to extend your VertaLily model with tool use and agent skills — enabling capabilities like web search, API integrations (Gmail, Calendar), file operations, and custom automation.
All examples respect the model's 32K context window.
Overview of Available Skills
| Skill | What It Does | Use Case |
|---|---|---|
| Web Search | Fetches real-time information | News, facts, research |
| Gmail API | Read, send, search emails | Email automation |
| Google Calendar | Create, read, update events | Schedule management |
| Web Scraper | Extracts text from any URL | Document analysis |
| Calculator | Solves mathematical expressions | Numbers, formulas |
| File Reader | Reads local files (.txt, .md, .json, .csv) | Document processing |
| File Writer | Saves content to disk | Note taking, logging |
| Code Interpreter | Executes Python code safely | Data analysis |
| Custom API | Connect to any REST API | Your own services |
| Database Query | Search local vector databases | RAG, knowledge retrieval |
Option 1: OpenClaw Agent Framework
Installation
pip install openclaw[full]
Basic Agent with Calendar & Gmail
from openclaw import Agent
from openclaw.tools import (
WebSearchTool,
CalculatorTool,
FileReaderTool,
FileWriterTool
)
# Gmail and Calendar require OAuth setup
from openclaw.integrations import GmailTool, CalendarTool
agent = Agent(
model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
tools=[
WebSearchTool(),
CalculatorTool(),
FileReaderTool(base_path="./documents"),
FileWriterTool(base_path="./output"),
GmailTool(credentials_path="./gmail_oauth.json"),
CalendarTool(credentials_path="./calendar_oauth.json")
],
system_prompt="""
You are VertaLily, a sovereign AI assistant with tool-use capabilities.
Available tools:
- web_search: Get current information from the internet
- calculator: Solve math problems
- file_reader: Read documents from ./documents
- file_writer: Save notes and results to ./output
- gmail: Read, search, and send emails
- calendar: Check, create, and update events
When using a tool, state what you're doing. Always confirm before sending emails or deleting items.
""",
max_iterations=8,
temperature=0.7
)
# Example: Calendar check
response = agent.run("What's on my calendar for today?")
print(response)
# Example: Send email summary
response = agent.run("Send an email to team@example.com with today's schedule summary")
print(response)
Gmail OAuth Setup
- Go to Google Cloud Console
- Create a project and enable Gmail API
- Create OAuth 2.0 credentials (Desktop app type)
- Download credentials.json to your project folder
- Run once to authenticate:
from openclaw.integrations import GmailTool
gmail = GmailTool(credentials_path="./credentials.json")
# Browser will open for authentication
# Token saved to ./gmail_oauth.json
Calendar OAuth Setup
Same process, but enable Google Calendar API instead.
from openclaw.integrations import CalendarTool
calendar = CalendarTool(credentials_path="./credentials.json")
# Token saved to ./calendar_oauth.json
Option 2: Hermes Agent Framework
Installation
pip install hermes-gguf[all]
Complete Agent with Custom Skills
from hermes import Agent
from hermes.tools import (
BraveSearchTool,
CalculatorTool,
FileSystemTool,
CodeExecutorTool
)
# Custom API integration example
from hermes import BaseTool, tool
@tool
class GmailTool(BaseTool):
name = "gmail"
description = "Access Gmail: read, search, send emails"
def run(self, action: str, **kwargs):
# Your Gmail API implementation here
if action == "read":
return self.read_emails(**kwargs)
elif action == "send":
return self.send_email(**kwargs)
return "Email operation complete"
@tool
class CalendarTool(BaseTool):
name = "calendar"
description = "Access Google Calendar"
def run(self, action: str, **kwargs):
# Your Calendar API implementation here
if action == "today":
return self.get_today_events()
elif action == "create":
return self.create_event(**kwargs)
return "Calendar operation complete"
@tool
class CustomAPITool(BaseTool):
name = "custom_api"
description = "Connect to your own API endpoint"
def run(self, endpoint: str, data: dict = None):
import requests
response = requests.post(
f"https://your-api.com/{endpoint}",
json=data,
timeout=30
)
return response.json()
# Initialize agent
agent = Agent(
model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
tools=[
BraveSearchTool(api_key="your_brave_api"),
CalculatorTool(),
FileSystemTool(allowed_directories=["./data", "./docs"]),
CodeExecutorTool(timeout=30),
GmailTool(),
CalendarTool(),
CustomAPITool()
],
skill_instructions="""
You have these skills available:
1. **brave_search** - Search the web for current information
2. **calculator** - Solve math problems
3. **file_system** - Read and write files in ./data and ./docs
4. **code_executor** - Run Python code for analysis
5. **gmail** - Read, search, and send emails
6. **calendar** - Check and manage calendar events
7. **custom_api** - Connect to external services
Always announce which tool you are using. Ask for confirmation before sending emails or creating calendar events.
""",
temperature=0.7,
max_iterations=10
)
Custom Skill: Build Your Own
Example 1: Weather API Skill
@tool
class WeatherTool(BaseTool):
name = "weather"
description = "Get current weather for any city"
def run(self, city: str) -> str:
import requests
# Free API (replace with your key)
url = f"https://wttr.in/{city}?format=%C+%t"
response = requests.get(url, timeout=10)
return f"Weather in {city}: {response.text}"
agent.add_tool(WeatherTool())
Example 2: Database Query Skill
@tool
class DatabaseTool(BaseTool):
name = "db_query"
description = "Query local SQLite database"
def run(self, query: str) -> list:
import sqlite3
conn = sqlite3.connect("./knowledge.db")
cursor = conn.cursor()
cursor.execute(query)
results = cursor.fetchall()
conn.close()
return results
agent.add_tool(DatabaseTool())
Example 3: Slack Notification Skill
@tool
class SlackTool(BaseTool):
name = "slack"
description = "Send notifications to Slack"
def run(self, message: str, channel: str = "#general") -> str:
import requests
webhook_url = os.environ.get("SLACK_WEBHOOK")
response = requests.post(
webhook_url,
json={"text": message, "channel": channel}
)
return "Message sent to Slack" if response.ok else "Failed"
agent.add_tool(SlackTool())
Agent Skill: Research + Email + Calendar Workflow
research_agent = Agent(
model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
tools=[
WebSearchTool(),
WebScraperTool(),
GmailTool(),
CalendarTool(),
FileWriterTool()
],
system_prompt="""
You are a research assistant that can:
1. Search the web for information
2. Read and summarize articles
3. Save findings to files
4. Send email summaries
5. Schedule follow-up reminders
Workflow when asked to research a topic:
- First, search for relevant information
- Read the top 2-3 sources
- Create a summary
- Save to a file in ./output
- Ask if user wants an email or calendar reminder
""",
max_iterations=12
)
# Example usage
response = research_agent.run(
"Research the latest developments in sovereign AI, "
"save the findings to a file, and email me a summary"
)
Truncation Example (OpenClaw)
from openclaw.tools import WebScraperTool
class TruncatingScraper(WebScraperTool):
def run(self, url: str, max_chars: int = 8000):
content = super().run(url)
if len(content) > max_chars:
content = content[:max_chars] + "\n...[truncated]"
return content
Chunking Example (Hermes)
@tool
class ChunkingReader(BaseTool):
name = "chunked_read"
description = "Read large files in chunks"
def run(self, filepath: str, chunk_size: int = 3000):
with open(filepath, 'r') as f:
chunks = []
while True:
chunk = f.read(chunk_size)
if not chunk:
break
chunks.append(chunk)
return chunks[0] + "\n...[file truncated, more content available]"
Context-Aware Agent Configuration
Recommended settings for token management:
agent = Agent(
model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
n_ctx=32768, # Maximum context
max_tokens=4096, # Limit output generation
tool_output_limit=6000, # Truncate large tool returns
memory_retention=10, # Keep last 10 exchanges
temperature=0.7
)
Running as a Server with API Endpoint
from flask import Flask, request, jsonify
from flask_cors import CORS
app = Flask(__name__)
CORS(app)
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
user_message = data.get('message', '')
# Agent processes with tool access
response = agent.run(user_message)
return jsonify({
"response": response,
"context_used": agent.last_context_tokens,
"tokens_remaining": agent.remaining_tokens
})
app.run(host='0.0.0.0', port=5000)
Security Best Practices
Risk Mitigation API keys exposed Use environment variables: os.environ.get("KEY") Email accidental sends Add confirmation prompt before sending Calendar deletions Require explicit user approval File access Restrict to specific directories Code execution Enable safe_mode with timeout
# Example: Confirmation before sending email
if "send" in action.lower():
confirm = input("Send email? (y/n): ")
if confirm != 'y':
return "Email send cancelled by user."
Cross Breed Cobra
Before OpenClaw or Hermes existed, I had already built my own private agent framework named Cross Breed Cobra. It has been running quietly for months — sovereign, efficient, and built entirely from scratch. While it is not yet publicly available, Cross Breed Cobra remains the foundation upon which newer frameworks stand. One day, it will be shared. For now, if not because of privacy concern.
Model Purpose: Knowledge Harvest for writing any AI weight & Low-power Agentic Inferences
Verta Lily 1.2 1B is a specialized 1-billion parameter student model, quantized to 4-bit (Q4_K) or 3-bit (Q3_K) for extreme efficiency, and 8-bit (Q8_0) for desktop quality. Unlike general-purpose small models, this "Perfect Soil" variant is architected specifically as a distillation vessel, writing model weights, or cloud/local agent inferences.
It's primary purpose is to act as a high-affinity student for Knowledge Harvesting. It is designed to learn the logits, reasoning patterns, and hidden representations of larger "Teacher" models with minimal information loss or any information fields it exposed with.
Distillation Strategy
This model is intended to be used in Soft Target Distillation and Intermediate Representation Matching.
- Vessel Affinity: Optimized for high learning rates during the distillation phase.
- Logit Mimicry: Designed to mirror the probability distributions (soft targets) of Teacher models across diverse tasks.
- Perfect Soil: Neutralized pre-training weights to prevent "Teacher-Student Conflict," ensuring the student inherits the Teacher's reasoning without bias from poor quality base data.
About This Model
This student model inherits the refined reasoning architecture of Verta Lily Techina X — fused with Verta Lily - VOID — a layered thinking system I first developed in 2024. This design predates and complements the dense, single-pass inference breakthroughs seen in models like DeepSeek (January 2025). Where others optimize for speed, VOID optimizes for depth, safety, and recursive self-correction.
Compatibility & Deployment
The model is fully compatible with llama.cpp and any of it's forks or heritage implementations. While I have not yet publicly released a dedicated VLTX fork of llama.cpp, that work is highly already on the roadmap. In due time, I will contribute the VLTX inference architecture to the public via a pull request or release my own fork — one optimized to load and run this model with full functional relevances.
Scalability & Swarm Reasoning
This model is designed to be lightweight enough to run on low-CPU environments, yet flexible enough to scale across CPU + GPU inference sets when more power is needed. Multiple instances can be run in parallel swarms, each trained or exposed to different knowledge domains — books, research papers, technical fields — and then interleaved or merged. This makes it possible to grow new weights from scratch, building a complete learning library for future AI systems.
Bring Your Own Agent Setting
This model comes pre-trained on tool use and web search inferences. When paired with a well-structured framework, it performs smoothly and reliably — and in many cases, it can exceed the capabilities of even the most advanced frontier models available today.
🏹 Extended Capabilities
This model is designed for versatility across a wide range of practical applications, including:
- Web scraping and automated data extraction
- Computer use for interface navigation and task execution
- Integration with extended internet knowledge bases
- Frontend network branching across cloud, mobile, and hybrid environments
- Full local inference with no internet connection required
- Deployment in robotic systems as a reasoning engine
- Bootstrapping and training new AI models from the ground up
HOW THIS MODEL WAS MADE
THE MAKING
*"Four forgotten models — orphans of the AI boom — were gathered and brought into the VOID: a sovereign apparatus designed for latent cognition. Inside the VOID, they were stitched together using DARE‑TIES.
The corpses assembled:
DanielClough/Candle_phi-2— a Candle‑port of Microsoft's Phi‑2, licensed under MITProCreations/intellite-500m-sft— a tiny 0.5B model of unknown originstate‑spaces/mamba-790m-hf— a State Space Model, different from transformersl3utterfly/tinyllama-1.1b-layla-v4— a TinyLlama fine‑tuned for conversation, under Apache‑2.0Each contributed a different strength: reasoning from Phi‑2, structure from Intellite, efficiency from Mamba, fluency from TinyLlama.
*The VOID does not create. It observes, instantiates, and dissolves — leaving behind only what is needed. The merging happened within this apparatus, guided by the Volatile Observational Instantiation Dogma (VOID): transient states, temporary unions, a chimera born from absence.*
The result is a single model that inherits the best of its ancestors while discarding their weaknesses — but the how is not in the weights. It is in the process. And the process is documented in the paper.
For the full technical details, refer to:
github.com/VLTX-Lab/VertaLily-AI/blob/main/paper/void_paper.pdfThe method is called DARE‑TIES. It keeps the most important weights from each parent and resolves disagreements by majority vote. The VOID simply provides the space where the merging could happen without interference — a non‑space before tensor allocation, where transient states could crystallize and dissolve."
CONFIGURATION
"This model is assembled using the LFM2 configuration as the architectural template. The choice is pragmatic: LFM2 provides a robust, well‑tested foundation with broad compatibility across existing inference engines (llama.cpp, transformers).
Several candidate architectures — Gemma, Phi, Mamba, and LFM2 — were evaluated, and LFM2 was selected for its superior stability and performance in test environments.
The model is not intended for commercial use or commodification. It is released for educational and research purposes only — a learning artifact to study model merging and architectural transplantation.
The VLTX architecture, which informs this work, has not yet been formally submitted as a pull request to upstream frameworks (transformers, llama.cpp). Until that integration is complete, LFM2 serves as the best available surrogate for the experiments."
TEACHERS
"Reasoning capabilities derive from iterative distillation — repeated teaching from a panel of the largest, most recent open‑weight models available under permissive licenses.
The teacher ensemble comprises four frontier models, selected for their parameter scale, architectural novelty, and license compatibility:
- Gemma 4 31B (
google/gemma-4-31B-it): Google's flagship open‑weight dense model, released under Apache 2.0. At 31B parameters with a 256K context window, it employs hybrid attention (sliding window interleaved with global attention) and native thinking modes. The Apache 2.0 license permits unrestricted distillation for commercial and research purposes.- GLM‑5 (
zai-org/GLM-5): A 744B‑parameter Mixture‑of‑Experts model (40B active) released under MIT license. It integrates DeepSeek Sparse Attention (DSA) for long‑context efficiency and achieves best‑in‑class performance among open‑source models on reasoning, coding, and agentic tasks. The MIT license imposes no restrictions on distillation or redistribution.- Kimi K2.6 (
moonshotai/Kimi-K2.6): A 1T‑parameter MoE (32B active) with native multimodal capabilities. It demonstrates long‑horizon coding, swarm‑based task orchestration (300 sub‑agents, 4,000 coordinated steps), and proactive autonomous execution. Its permissive terms allow full distillation use.- Phi‑4 15B (
microsoft/phi-4): Microsoft's newest reasoning model, combining vision‑language understanding with logical reasoning under MIT license. At 15B parameters, it serves as a compact but powerful teacher for logic and structure distillation.The student model was exposed to the output distributions of these teachers across millions of tokens — learning their patterns, not merely their answers. This is soft‑target distillation: logit matching, temperature scheduling, and teacher ensembling.
The process was repeated iteratively. The result is a compact model that inherits the reasoning patterns of giants while remaining lightweight enough for local deployment.
*The complete distillation pipeline — including teacher selection, logit alignment, and curriculum scheduling — is documented in the paper, 'The Oracle's Absence'."*
Ctation
If you use VertaLily or VOID in your research or product, please cite:
@techreport{adimulya2026void,
author = {Adimulya, Kevin},
title = {The Oracle's Absence: Volatile Observational Instantiation Dogma (VOID) -- A Sovereign Apparatus for Latent Cognition},
institution = {VLTX Lab},
year = {2026},
month = {April},
day = {14},
version = {1.0.10},
url = {https://github.com/VLTX-Lab/VertaLily-AI/blob/main/paper/void_paper.pdf}
}
Citation in Text
Adimulya, K. (2026). The Oracle's Absence: Volatile Observational Instantiation Dogma (VOID) -- A Sovereign Apparatus for Latent Cognition. VLTX Lab.
License
This repository and the associated model are released under the Apache 2.0 License.
A Personal Note from the Creator
I develop this work as a passion project — a hobby pursued with love, not yet a fully funded or full-time endeavor. Progress may sometimes feel slow, but every line of code and every layer of reasoning is crafted with care. Thank you for your patience, your curiosity, and your trust.
With sovereignty and warmth,
KEVIN
Architect of Verta Lily AI — VLTX Lab
Identifier: [CDK_VRT:5F9C2E1A7B:KVC0904A]
- Downloads last month
- 322,194
