Instructions to use VLTX/VertaLily-1.2-1B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use VLTX/VertaLily-1.2-1B-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="VLTX/VertaLily-1.2-1B-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("VLTX/VertaLily-1.2-1B-GGUF", dtype="auto")

llama-cpp-python

How to use VLTX/VertaLily-1.2-1B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="VLTX/VertaLily-1.2-1B-GGUF",
	filename="VertaLily-1.2-1B-Q3_K-stable.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use VLTX/VertaLily-1.2-1B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use VLTX/VertaLily-1.2-1B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "VLTX/VertaLily-1.2-1B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VLTX/VertaLily-1.2-1B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

SGLang

How to use VLTX/VertaLily-1.2-1B-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "VLTX/VertaLily-1.2-1B-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VLTX/VertaLily-1.2-1B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "VLTX/VertaLily-1.2-1B-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VLTX/VertaLily-1.2-1B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use VLTX/VertaLily-1.2-1B-GGUF with Ollama:
```
ollama run hf.co/VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
```

Unsloth Studio

How to use VLTX/VertaLily-1.2-1B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for VLTX/VertaLily-1.2-1B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for VLTX/VertaLily-1.2-1B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for VLTX/VertaLily-1.2-1B-GGUF to start chatting

How to use VLTX/VertaLily-1.2-1B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use VLTX/VertaLily-1.2-1B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use VLTX/VertaLily-1.2-1B-GGUF with Docker Model Runner:
```
docker model run hf.co/VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M
```

Lemonade

How to use VLTX/VertaLily-1.2-1B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull VLTX/VertaLily-1.2-1B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.VertaLily-1.2-1B-GGUF-Q4_K_M

List all available models

lemonade list

Sovereign Home • Model Vault • Architect Repo

Verta Lily AI 1.2 1B

also known as -- VertaLily Techina X: Student-Perfect Soil

FAS 1.0 Alignment | Architecture: vltx

Model Specifications

Architecture: vltx
Deployment: Optimized for ARM CPU and Higher Computations

In comparative evaluation, Verta Lily‑1.2‑1B achieved superior performance in general knowledge (78 % ± 3) and oracle reasoning (74 % ± 4), surpassing larger baselines such as Gemma-4-E2B (google/gemma-4-E2B), Qwen3‑4B (Qwen/Qwen3-4B), and Microsoft Phi‑3‑mini (microsoft/Phi-3-mini-4k-instruct), as well as the compact LFM2.5‑1.2B‑Instruct (LiquidAI/LFM2.5-1.2B-Instruct), with statistically significant margins (p < 0.05). Its compact 1 B architecture consistently delivered higher factual recall and logical coherence while maintaining quantization stability, translating into a normalized performance‑per‑cost score of 1.20 — the highest among all tested systems. This establishes Verta Lily as a benchmark‑efficient model, providing 20 % more usable reasoning per compute unit compared to peers.

When extended with inference‑side augmentation — specifically, real‑world knowledge retrieval and integrated web search — Verta Lily’s sovereign design demonstrates the ability to exceed even frontier‑scale models. Identity anchoring and behavioral stabilization ensure coherent reasoning, while retrieval‑augmented inference bridges factual gaps dynamically. This hybrid approach allows Verta Lily to combine the efficiency of small‑scale architectures with the adaptability of large‑scale systems, positioning it as a sustainable model for edge deployment, privacy‑centric applications, and academic research. The benchmark thus not only validates its baseline efficiency against both larger and compact baselines but also highlights it's potential to outperform frontier models when inference is coupled with external knowledge integration.

#	Filename	Quantization	Bit Depth	Size	Best For
1	`VertaLily-1.2-1B-Q3_K-stable.gguf`	Q3_K (K-means variant)	~3.5 bits per weight	0.60 GB	Resource-constrained environments — mobile, Pi boards alike, edge devices, low-RAM systems, batch inference on CPU. Fastest inference, smallest memory footprint.
2	`VertaLily-1.2-1B-Q4_K_M-stable.gguf`	Q4_K_M (K-means medium)	~4.5 bits per weight	0.73 GB	Balanced sweet spot — great trade-off between speed, memory, and output quality. Ideal for most general use, local servers, and CPU inference where quality matters but resources aren't abundant.
3	`VertaLily-1.2-1B-Q8_0-stable.gguf`	Q8_0 (8-bit block-wise)	8 bits per weight	1.25 GB	Highest quality — closest to original precision. Best for GPU inference, quality-critical tasks, and when memory is not a constraint. Minimal quality loss from full precision.

VertaLily on iOS (iPhone / iPad)

You can run VertaLily models locally on your iPhone or iPad using LLM Farm or PocketPal — both free, offline-first apps that support GGUF models.

Requirements

iPhone or iPad with iOS 17+ (or iPadOS 17+)
At least 1.5 GB free storage (2 GB recommended)
Minimum 2 GB RAM (iPhone 12 or newer recommended)

Recommended App: LLM Farm

LLM Farm is a free, open-source app designed for running GGUF models locally on iOS.

Download

Search "LLM Farm" on the App Store, or visit: 🔗 https://apps.apple.com/app/llm-farm/id6472836928

Setup Steps

Install LLM Farm from the App Store
Download the model from Hugging Face on your computer or directly on your iPhone
Transfer the .gguf file to your iPhone (AirDrop, iCloud Drive, or Files app)
Open LLM Farm → tap "Load Model" → browse to the .gguf file
Select the model and wait for it to load
Start chatting offline — no internet required

Recommended Quantization for iOS

Model	Size	Best For
`VertaLily-1.2-1B-Q3_K-stable.gguf`	0.60 GB	iPhone 12 and older, iPad with 3GB RAM
`VertaLily-1.2-1B-Q4_K_M-stable.gguf`	0.73 GB	iPhone 13 and newer, iPad Pro with 8GB+ RAM

⚠️ The Q8_0 version is too large for most iPhones (1.25 GB + overhead). Stick with Q3_K for the best balance of speed and quality on mobile.

Alternative App: PocketPal

PocketPal is another excellent option for running GGUF models on iOS.

Download

Search "PocketPal" on the App Store, or visit: 🔗 https://apps.apple.com/app/pocketpal/id6502573055

Setup Steps

Install PocketPal
Download the .gguf model file
Use AirDrop or Files app to transfer to your iPhone
Open PocketPal → tap "Import Model" → select your file
The app will automatically detect the model architecture
Begin your conversation — fully local and private

Tips for Best Performance on iOS

Close other apps before loading the model to free up RAM
Use Q3_K for fastest response times
Keep your iPhone plugged in during long inference sessions (battery drain is normal)
Shorter responses generate faster than long, complex ones
Lower context window (512-1024 tokens) if you experience slow performance

Troubleshooting

Issue	Solution
App crashes on load	Model is too large for device RAM. Use Q3_K instead.
Slow response time	Lower the context window or use Q3_K quantization.
Cannot find model file	Check that the file is in the Files app and not in iCloud (download locally first).
Model loads but doesn't respond	Restart the app and try loading again. Some apps require 2-3 attempts.

Privacy Note

All processing happens on your device — your conversations never leave your iPhone. No internet connection required after the model is downloaded.

Verta Lily AI — Sovereign. Local. On your iPhone.

Try VertaLily-1.2-1B on Android

You can run this model locally, without internet, using the Off-Grid APK release.

Requirements

Android device (phone or tablet)
At least 1 GB free storage
Minimum 1 GB RAM (2 GB recommended)

Download Off-Grid APK

Get the latest Off-Grid inference APK from the official release page:

🔗 https://github.com/alichherawalla/off-grid-mobile-ai/releases

Recommended Model

For off-grid / mobile use, start with the Q3_K version — smallest size, fastest inference, lowest memory usage.

Model	Size	Best For
`VertaLily-1.2-1B-Q3_K-stable.gguf`	0.60 GB	Mobile, Pi boards, edge devices, low-RAM systems

How to Load

Install the Off-Grid APK on your device
Download the Q3_K model file from Hugging Face
Copy the .gguf file to your device storage
Open the app and select the model file
Start using VertaLily offline — no internet required

Notes

The same APK can also load Q4_K_M and Q8_0 versions if your device has enough RAM
For best performance on lower-end devices, stick with Q3_K

Ollama Setup

You can also run VertaLily models using Ollama on Linux, macOS, or Windows.

Install Ollama

Follow the official guide: https://ollama.com/download

Create a Modelfile

Create a file named Modelfile with the following content:

FROM ./VertaLily-1.2-1B-Q4_K_M-stable.gguf

TEMPLATE """{{- if .System }}<|system|>
{{ .System }}<|end|>
{{- end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{- end }}<|assistant|>
"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|system|>"

Build and Run

ollama create vertalily-1.2b -f Modelfile
ollama run vertalily-1.2b

Quantization Choice for Ollama

Hardware Recommended Quant CPU only, limited RAM Q3_K CPU with 4GB+ RAM Q4_K_M GPU or abundant RAM Q8_0

OpenClaw Agent Framework

OpenClaw is a lightweight agent framework for deploying GGUF models with tool-use capabilities.

Installation

git clone https://github.com/OpenClaw/openclaw
cd openclaw
pip install -r requirements.txt

Load Model as Agent

from openclaw import Agent

agent = Agent(
    model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
    tools=["web_search", "calculator", "file_read"],
    max_iterations=5
)

response = agent.run("What is the current weather and calculate 15% of 80?")
print(response)

CLI Agent Mode

python run_agent.py --model VertaLily-1.2-1B-Q4_K_M-stable.gguf --tools all

Hermes Agent Framework

Hermes provides a production-ready agent framework with API endpoints, memory, and multi-turn conversations.

Installation

pip install hermes-gguf

Agent Setup

from hermes import AgentServer

server = AgentServer(
    model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
    tools=["search", "code_interpreter", "rag"],
    memory_type="conversation_buffer",
    port=8000
)

server.start()

Agent API Request

curl -X POST http://localhost:8000/agent/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Help me debug this Python script", "session_id": "user123"}'

Hermes with Custom Tools

from hermes import Agent, tool

@tool
def fetch_database(query: str) -> str:
    # Your custom logic here
    return f"Query result for: {query}"

agent = Agent(
    model_path="VertaLily-1.2-1B-Q8_0-stable.gguf",
    custom_tools=[fetch_database]
)

Inference Setup Plan

Sovereign Agent Setup: OpenClaw / Hermes + Open WebUI

This guide walks you through building a sovereign AI agent with persistent memory, web scraping capabilities, and a clean chat interface using Open WebUI.

Architecture Overview


[Open Web UI] ←→ [OpenClaw or Hermes Agent] ←→ [Model: VertaLily-1.2-1B]
↓
[Memory Vector DB]
↓
[Web Scraper Tools]

Option 1: OpenClaw + Open WebUI

Step 1 — Install OpenClaw with Agent Extras

git clone https://github.com/OpenClaw/openclaw
cd openclaw
pip install -r requirements.txt
pip install chromadb requests beautifulsoup4 selenium

Step 2 — Create Sovereign Agent Script

Create sovereign_agent.py:

from openclaw import Agent
from openclaw.memory import ChromaMemory
from openclaw.tools import WebScraper, Calculator, FileReader
import json

# Persistent memory (conversations survive restarts)
memory = ChromaMemory(
    persist_directory="./agent_memory",
    collection_name="sovereign_conversations"
)

# Web scraper tool with privacy focus
scraper = WebScraper(
    headless=True,
    respect_robots=True,
    user_agent="VertaLily-Sovereign-Agent/1.0"
)

# Initialize agent
agent = Agent(
    model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
    memory=memory,
    tools=[
        scraper,
        Calculator(),
        FileReader()
    ],
    system_prompt="""
You are VertaLily, a sovereign AI assistant.
- You are private, local, privacy based model.
- Your memory persists across conversations.
- When asked to research, use the web scraper tool.
- Always respect user privacy. Never log or share data externally.
- You answer with clarity, warmth, and precision.
""",
    max_iterations=5,
    temperature=0.7
)

# API endpoint for Open WebUI
from flask import Flask, request, jsonify
from flask_cors import CORS

app = Flask(__name__)
CORS(app)

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    user_message = data.get('message', '')
    session_id = data.get('session_id', 'default')
    
    response = agent.run(
        user_message,
        session_id=session_id
    )
    
    return jsonify({
        "response": response,
        "session_id": session_id
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Step 3 — Run the Agent Server

python sovereign_agent.py

Your agent API is now running at http://localhost:5000/chat

Option 2: Hermes + Open WebUI

Step 1 — Install Hermes

pip install hermes-gguf chromadb langchain beautifulsoup4

Step 2 — Create Hermes Sovereign Agent

Create hermes_sovereign.py:

from hermes import AgentServer, Memory, Tool
from hermes.tools import WebScrapeTool, VectorSearchTool
import chromadb

# Sovereign memory setup
chroma_client = chromadb.PersistentClient(path="./sovereign_memory")
memory = Memory(
    client=chroma_client,
    collection="conversation_history",
    top_k=5
)

# Custom web scraper with sovereignty rules
class SovereignWebScraper(Tool):
    name = "web_scraper"
    description = "Scrape web pages for current information. Respects robots.txt."
    
    def run(self, url: str, max_chars: int = 5000):
        from bs4 import BeautifulSoup
        import requests
        
        headers = {
            'User-Agent': 'VertaLily-Sovereign/1.0'
        }
        response = requests.get(url, headers=headers, timeout=30)
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Remove scripts and styles
        for script in soup(["script", "style"]):
            script.decompose()
        
        text = soup.get_text(separator=' ', strip=True)
        return text[:max_chars]

# Initialize agent server
server = AgentServer(
    model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
    memory=memory,
    tools=[
        SovereignWebScraper(),
        VectorSearchTool(index_path="./knowledge_base"),
    ],
    system_prompt="""
=== SOVEREIGN AGENT MODE ===
You are VertaLily — a private, sovereign AI.
- Your memory is local. Nothing leaves this server.
- You can scrape the web when asked, but you respect robots.txt.
- You remember past conversations within the same session.
- You do not pretend to be human. You are an AI assistant.
- You answer truthfully, warmly, and efficiently.
""",
    temperature=0.7,
    max_tokens=1024
)

server.serve(port=5000, host="0.0.0.0")

Step 3 — Run Hermes Server

python hermes_sovereign.py

Step 4: Install Open WebUI (Beautiful UI)

Open WebUI is a self-hostable, privacy-first chat interface.

Docker Install (Recommended)

docker run -d -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Connect Open WebUI to Your Agent

Open your browser to http://localhost:3000
Create an admin account (first user becomes admin)
Go to Settings → Connections
Add a Custom OpenAI Compatible Endpoint: · URL: http://host.docker.internal:5000/chat · API Key: (leave blank or enter any value) · Model Name: VertaLily
Save and select your model from the dropdown

Alternative: Manual Open WebUI Setup

git clone https://github.com/open-webui/open-webui
cd open-webui
pip install -r requirements.txt
cp .env.example .env
# Edit .env to point to your agent API
python backend/main.py

Step 5: Memory & Web Scraper in Action

Once everything is running, your agent can:

Persistent Memory Example

User: "My name is Kevin. I am building sovereign AI."
Agent: "Nice to meet you, Kevin. How can I assist with your sovereign AI work?"

User: "What did I tell you my name was?"
Agent: "You told me your name is Kevin. I remember because my memory persists across turns."

Web Scraper Example

User: "Scrape https://example.com/news and summarize the top story"
Agent: [Calls web_scraper tool] → [Processes content] → "The top story is about..."

Memory + Web Together

User: "Remember this fact: The Verta Lily model is 1.2-1B parameters."
Agent: "I've stored that."

User: "Now research recent AI news and compare it to my model"
Agent: [Recalls stored fact] + [Scrapes web] → "Compared to your 1.2-1B model..."

Step 6: Sovereign UI Customization (Open WebUI)

To make the interface reflect your sovereign branding:

Go to Admin Panel → Settings → Branding
Set: · App Name: VertaLily Sovereign · Default Model: VertaLily-1.2-1B · Theme: Dark (or custom CSS)
Add custom avatar for your AI
Disable telemetry (Settings → Analytics → Disable All)

Complete Directory Structure

~/sovereign-agent/
├── models/
│   └── VertaLily-1.2-1B-Q4_K_M-stable.gguf
├── agent_memory/          (ChromaDB persists here)
├── knowledge_base/        (your documents for RAG)
├── sovereign_agent.py     (OpenClaw version)
├── hermes_sovereign.py    (Hermes version)
└── start.sh

Quick Start Script (OpenClaw)

Create start.sh:

#!/bin/bash
echo "Starting Sovereign Agent..."
export MODEL_PATH="./models/VertaLily-1.2-1B-Q4_K_M-stable.gguf"
python sovereign_agent.py &
echo "Agent running on http://localhost:5000"
echo "Open WebUI should be on http://localhost:3000"
wait

Security & Privacy Notes

Feature Implementation No data leaves your machine All inference local Memory is encrypted ChromaDB stored locally Web scraper respects robots.txt Ethical scraping only Open WebUI telemetry Disable in settings No API keys required Fully self-contained

Tool Use and Agent Skill

This guide shows how to extend your VertaLily model with tool use and agent skills — enabling capabilities like web search, API integrations (Gmail, Calendar), file operations, and custom automation.

All examples respect the model's 32K context window.

Overview of Available Skills

Skill	What It Does	Use Case
Web Search	Fetches real-time information	News, facts, research
Gmail API	Read, send, search emails	Email automation
Google Calendar	Create, read, update events	Schedule management
Web Scraper	Extracts text from any URL	Document analysis
Calculator	Solves mathematical expressions	Numbers, formulas
File Reader	Reads local files (.txt, .md, .json, .csv)	Document processing
File Writer	Saves content to disk	Note taking, logging
Code Interpreter	Executes Python code safely	Data analysis
Custom API	Connect to any REST API	Your own services
Database Query	Search local vector databases	RAG, knowledge retrieval

Option 1: OpenClaw Agent Framework

Installation

pip install openclaw[full]

Basic Agent with Calendar & Gmail

from openclaw import Agent
from openclaw.tools import (
    WebSearchTool,
    CalculatorTool,
    FileReaderTool,
    FileWriterTool
)

# Gmail and Calendar require OAuth setup
from openclaw.integrations import GmailTool, CalendarTool

agent = Agent(
    model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
    tools=[
        WebSearchTool(),
        CalculatorTool(),
        FileReaderTool(base_path="./documents"),
        FileWriterTool(base_path="./output"),
        GmailTool(credentials_path="./gmail_oauth.json"),
        CalendarTool(credentials_path="./calendar_oauth.json")
    ],
    system_prompt="""
You are VertaLily, a sovereign AI assistant with tool-use capabilities.

Available tools:
- web_search: Get current information from the internet
- calculator: Solve math problems
- file_reader: Read documents from ./documents
- file_writer: Save notes and results to ./output
- gmail: Read, search, and send emails
- calendar: Check, create, and update events

When using a tool, state what you're doing. Always confirm before sending emails or deleting items.
""",
    max_iterations=8,
    temperature=0.7
)

# Example: Calendar check
response = agent.run("What's on my calendar for today?")
print(response)

# Example: Send email summary
response = agent.run("Send an email to team@example.com with today's schedule summary")
print(response)

Gmail OAuth Setup

Go to Google Cloud Console
Create a project and enable Gmail API
Create OAuth 2.0 credentials (Desktop app type)
Download credentials.json to your project folder
Run once to authenticate:

from openclaw.integrations import GmailTool

gmail = GmailTool(credentials_path="./credentials.json")
# Browser will open for authentication
# Token saved to ./gmail_oauth.json

Calendar OAuth Setup

Same process, but enable Google Calendar API instead.

from openclaw.integrations import CalendarTool

calendar = CalendarTool(credentials_path="./credentials.json")
# Token saved to ./calendar_oauth.json

Option 2: Hermes Agent Framework

Installation

pip install hermes-gguf[all]

Complete Agent with Custom Skills

from hermes import Agent
from hermes.tools import (
    BraveSearchTool,
    CalculatorTool,
    FileSystemTool,
    CodeExecutorTool
)

# Custom API integration example
from hermes import BaseTool, tool

@tool
class GmailTool(BaseTool):
    name = "gmail"
    description = "Access Gmail: read, search, send emails"
    
    def run(self, action: str, **kwargs):
        # Your Gmail API implementation here
        if action == "read":
            return self.read_emails(**kwargs)
        elif action == "send":
            return self.send_email(**kwargs)
        return "Email operation complete"

@tool
class CalendarTool(BaseTool):
    name = "calendar"
    description = "Access Google Calendar"
    
    def run(self, action: str, **kwargs):
        # Your Calendar API implementation here
        if action == "today":
            return self.get_today_events()
        elif action == "create":
            return self.create_event(**kwargs)
        return "Calendar operation complete"

@tool
class CustomAPITool(BaseTool):
    name = "custom_api"
    description = "Connect to your own API endpoint"
    
    def run(self, endpoint: str, data: dict = None):
        import requests
        response = requests.post(
            f"https://your-api.com/{endpoint}",
            json=data,
            timeout=30
        )
        return response.json()

# Initialize agent
agent = Agent(
    model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
    tools=[
        BraveSearchTool(api_key="your_brave_api"),
        CalculatorTool(),
        FileSystemTool(allowed_directories=["./data", "./docs"]),
        CodeExecutorTool(timeout=30),
        GmailTool(),
        CalendarTool(),
        CustomAPITool()
    ],
    skill_instructions="""
You have these skills available:

1. **brave_search** - Search the web for current information
2. **calculator** - Solve math problems
3. **file_system** - Read and write files in ./data and ./docs
4. **code_executor** - Run Python code for analysis
5. **gmail** - Read, search, and send emails
6. **calendar** - Check and manage calendar events
7. **custom_api** - Connect to external services

Always announce which tool you are using. Ask for confirmation before sending emails or creating calendar events.
""",
    temperature=0.7,
    max_iterations=10
)

Custom Skill: Build Your Own

Example 1: Weather API Skill

@tool
class WeatherTool(BaseTool):
    name = "weather"
    description = "Get current weather for any city"
    
    def run(self, city: str) -> str:
        import requests
        # Free API (replace with your key)
        url = f"https://wttr.in/{city}?format=%C+%t"
        response = requests.get(url, timeout=10)
        return f"Weather in {city}: {response.text}"

agent.add_tool(WeatherTool())

Example 2: Database Query Skill

@tool
class DatabaseTool(BaseTool):
    name = "db_query"
    description = "Query local SQLite database"
    
    def run(self, query: str) -> list:
        import sqlite3
        conn = sqlite3.connect("./knowledge.db")
        cursor = conn.cursor()
        cursor.execute(query)
        results = cursor.fetchall()
        conn.close()
        return results

agent.add_tool(DatabaseTool())

Example 3: Slack Notification Skill

@tool
class SlackTool(BaseTool):
    name = "slack"
    description = "Send notifications to Slack"
    
    def run(self, message: str, channel: str = "#general") -> str:
        import requests
        webhook_url = os.environ.get("SLACK_WEBHOOK")
        response = requests.post(
            webhook_url,
            json={"text": message, "channel": channel}
        )
        return "Message sent to Slack" if response.ok else "Failed"

agent.add_tool(SlackTool())

Agent Skill: Research + Email + Calendar Workflow

research_agent = Agent(
    model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
    tools=[
        WebSearchTool(),
        WebScraperTool(),
        GmailTool(),
        CalendarTool(),
        FileWriterTool()
    ],
    system_prompt="""
You are a research assistant that can:

1. Search the web for information
2. Read and summarize articles
3. Save findings to files
4. Send email summaries
5. Schedule follow-up reminders

Workflow when asked to research a topic:
- First, search for relevant information
- Read the top 2-3 sources
- Create a summary
- Save to a file in ./output
- Ask if user wants an email or calendar reminder
""",
    max_iterations=12
)

# Example usage
response = research_agent.run(
    "Research the latest developments in sovereign AI, "
    "save the findings to a file, and email me a summary"
)

Truncation Example (OpenClaw)

from openclaw.tools import WebScraperTool

class TruncatingScraper(WebScraperTool):
    def run(self, url: str, max_chars: int = 8000):
        content = super().run(url)
        if len(content) > max_chars:
            content = content[:max_chars] + "\n...[truncated]"
        return content

Chunking Example (Hermes)

@tool
class ChunkingReader(BaseTool):
    name = "chunked_read"
    description = "Read large files in chunks"
    
    def run(self, filepath: str, chunk_size: int = 3000):
        with open(filepath, 'r') as f:
            chunks = []
            while True:
                chunk = f.read(chunk_size)
                if not chunk:
                    break
                chunks.append(chunk)
        return chunks[0] + "\n...[file truncated, more content available]"

Context-Aware Agent Configuration

Recommended settings for token management:

agent = Agent(
    model_path="VertaLily-1.2-1B-Q4_K_M-stable.gguf",
    n_ctx=32768,              # Maximum context
    max_tokens=4096,          # Limit output generation
    tool_output_limit=6000,   # Truncate large tool returns
    memory_retention=10,      # Keep last 10 exchanges
    temperature=0.7
)

Running as a Server with API Endpoint

from flask import Flask, request, jsonify
from flask_cors import CORS

app = Flask(__name__)
CORS(app)

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    user_message = data.get('message', '')
    
    # Agent processes with tool access
    response = agent.run(user_message)
    
    return jsonify({
        "response": response,
        "context_used": agent.last_context_tokens,
        "tokens_remaining": agent.remaining_tokens
    })

app.run(host='0.0.0.0', port=5000)

Security Best Practices

Risk Mitigation API keys exposed Use environment variables: os.environ.get("KEY") Email accidental sends Add confirmation prompt before sending Calendar deletions Require explicit user approval File access Restrict to specific directories Code execution Enable safe_mode with timeout

# Example: Confirmation before sending email
if "send" in action.lower():
    confirm = input("Send email? (y/n): ")
    if confirm != 'y':
        return "Email send cancelled by user."

Cross Breed Cobra

Before OpenClaw or Hermes existed, I had already built my own private agent framework named Cross Breed Cobra. It has been running quietly for months — sovereign, efficient, and built entirely from scratch. While it is not yet publicly available, Cross Breed Cobra remains the foundation upon which newer frameworks stand. One day, it will be shared. For now, if not because of privacy concern.

Model Purpose: Knowledge Harvest for writing any AI weight & Low-power Agentic Inferences

Verta Lily 1.2 1B is a specialized 1-billion parameter student model, quantized to 4-bit (Q4_K) or 3-bit (Q3_K) for extreme efficiency, and 8-bit (Q8_0) for desktop quality. Unlike general-purpose small models, this "Perfect Soil" variant is architected specifically as a distillation vessel, writing model weights, or cloud/local agent inferences.

It's primary purpose is to act as a high-affinity student for Knowledge Harvesting. It is designed to learn the logits, reasoning patterns, and hidden representations of larger "Teacher" models with minimal information loss or any information fields it exposed with.

Distillation Strategy

This model is intended to be used in Soft Target Distillation and Intermediate Representation Matching.

Vessel Affinity: Optimized for high learning rates during the distillation phase.
Logit Mimicry: Designed to mirror the probability distributions (soft targets) of Teacher models across diverse tasks.
Perfect Soil: Neutralized pre-training weights to prevent "Teacher-Student Conflict," ensuring the student inherits the Teacher's reasoning without bias from poor quality base data.

About This Model

This student model inherits the refined reasoning architecture of Verta Lily Techina X — fused with Verta Lily - VOID — a layered thinking system I first developed in 2024. This design predates and complements the dense, single-pass inference breakthroughs seen in models like DeepSeek (January 2025). Where others optimize for speed, VOID optimizes for depth, safety, and recursive self-correction.

Compatibility & Deployment

The model is fully compatible with llama.cpp and any of it's forks or heritage implementations. While I have not yet publicly released a dedicated VLTX fork of llama.cpp, that work is highly already on the roadmap. In due time, I will contribute the VLTX inference architecture to the public via a pull request or release my own fork — one optimized to load and run this model with full functional relevances.

Scalability & Swarm Reasoning

This model is designed to be lightweight enough to run on low-CPU environments, yet flexible enough to scale across CPU + GPU inference sets when more power is needed. Multiple instances can be run in parallel swarms, each trained or exposed to different knowledge domains — books, research papers, technical fields — and then interleaved or merged. This makes it possible to grow new weights from scratch, building a complete learning library for future AI systems.

Bring Your Own Agent Setting

This model comes pre-trained on tool use and web search inferences. When paired with a well-structured framework, it performs smoothly and reliably — and in many cases, it can exceed the capabilities of even the most advanced frontier models available today.

🏹 Extended Capabilities

This model is designed for versatility across a wide range of practical applications, including:

Web scraping and automated data extraction
Computer use for interface navigation and task execution
Integration with extended internet knowledge bases
Frontend network branching across cloud, mobile, and hybrid environments
Full local inference with no internet connection required
Deployment in robotic systems as a reasoning engine
Bootstrapping and training new AI models from the ground up

HOW THIS MODEL WAS MADE

THE MAKING

*"Four forgotten models — orphans of the AI boom — were gathered and brought into the VOID: a sovereign apparatus designed for latent cognition. Inside the VOID, they were stitched together using DARE‑TIES.

The corpses assembled:

DanielClough/Candle_phi-2 — a Candle‑port of Microsoft's Phi‑2, licensed under MIT

ProCreations/intellite-500m-sft — a tiny 0.5B model of unknown origin

state‑spaces/mamba-790m-hf — a State Space Model, different from transformers

l3utterfly/tinyllama-1.1b-layla-v4 — a TinyLlama fine‑tuned for conversation, under Apache‑2.0

Each contributed a different strength: reasoning from Phi‑2, structure from Intellite, efficiency from Mamba, fluency from TinyLlama.

*The VOID does not create. It observes, instantiates, and dissolves — leaving behind only what is needed. The merging happened within this apparatus, guided by the Volatile Observational Instantiation Dogma (VOID): transient states, temporary unions, a chimera born from absence.*

The result is a single model that inherits the best of its ancestors while discarding their weaknesses — but the how is not in the weights. It is in the process. And the process is documented in the paper.

For the full technical details, refer to:
github.com/VLTX-Lab/VertaLily-AI/blob/main/paper/void_paper.pdf

The method is called DARE‑TIES. It keeps the most important weights from each parent and resolves disagreements by majority vote. The VOID simply provides the space where the merging could happen without interference — a non‑space before tensor allocation, where transient states could crystallize and dissolve."

CONFIGURATION

"This model is assembled using the LFM2 configuration as the architectural template. The choice is pragmatic: LFM2 provides a robust, well‑tested foundation with broad compatibility across existing inference engines (llama.cpp, transformers).

Several candidate architectures — Gemma, Phi, Mamba, and LFM2 — were evaluated, and LFM2 was selected for its superior stability and performance in test environments.

The model is not intended for commercial use or commodification. It is released for educational and research purposes only — a learning artifact to study model merging and architectural transplantation.

The VLTX architecture, which informs this work, has not yet been formally submitted as a pull request to upstream frameworks (transformers, llama.cpp). Until that integration is complete, LFM2 serves as the best available surrogate for the experiments."

TEACHERS

"Reasoning capabilities derive from iterative distillation — repeated teaching from a panel of the largest, most recent open‑weight models available under permissive licenses.

The teacher ensemble comprises four frontier models, selected for their parameter scale, architectural novelty, and license compatibility:

- Gemma 4 31B (google/gemma-4-31B-it): Google's flagship open‑weight dense model, released under Apache 2.0. At 31B parameters with a 256K context window, it employs hybrid attention (sliding window interleaved with global attention) and native thinking modes. The Apache 2.0 license permits unrestricted distillation for commercial and research purposes.

- GLM‑5 (zai-org/GLM-5): A 744B‑parameter Mixture‑of‑Experts model (40B active) released under MIT license. It integrates DeepSeek Sparse Attention (DSA) for long‑context efficiency and achieves best‑in‑class performance among open‑source models on reasoning, coding, and agentic tasks. The MIT license imposes no restrictions on distillation or redistribution.

- Kimi K2.6 (moonshotai/Kimi-K2.6): A 1T‑parameter MoE (32B active) with native multimodal capabilities. It demonstrates long‑horizon coding, swarm‑based task orchestration (300 sub‑agents, 4,000 coordinated steps), and proactive autonomous execution. Its permissive terms allow full distillation use.

- Phi‑4 15B (microsoft/phi-4): Microsoft's newest reasoning model, combining vision‑language understanding with logical reasoning under MIT license. At 15B parameters, it serves as a compact but powerful teacher for logic and structure distillation.

The student model was exposed to the output distributions of these teachers across millions of tokens — learning their patterns, not merely their answers. This is soft‑target distillation: logit matching, temperature scheduling, and teacher ensembling.

The process was repeated iteratively. The result is a compact model that inherits the reasoning patterns of giants while remaining lightweight enough for local deployment.

*The complete distillation pipeline — including teacher selection, logit alignment, and curriculum scheduling — is documented in the paper, 'The Oracle's Absence'."*

Ctation

If you use VertaLily or VOID in your research or product, please cite:

@techreport{adimulya2026void,
    author = {Adimulya, Kevin},
    title = {The Oracle's Absence: Volatile Observational Instantiation Dogma (VOID) -- A Sovereign Apparatus for Latent Cognition},
    institution = {VLTX Lab},
    year = {2026},
    month = {April},
    day = {14},
    version = {1.0.10},
    url = {https://github.com/VLTX-Lab/VertaLily-AI/blob/main/paper/void_paper.pdf}
}

Citation in Text

Adimulya, K. (2026). The Oracle's Absence: Volatile Observational Instantiation Dogma (VOID) -- A Sovereign Apparatus for Latent Cognition. VLTX Lab.

License

This repository and the associated model are released under the Apache 2.0 License.

A Personal Note from the Creator

I develop this work as a passion project — a hobby pursued with love, not yet a fully funded or full-time endeavor. Progress may sometimes feel slow, but every line of code and every layer of reasoning is crafted with care. Thank you for your patience, your curiosity, and your trust.

With sovereignty and warmth,
KEVIN
Architect of Verta Lily AI — VLTX Lab

Identifier: [CDK_VRT:5F9C2E1A7B:KVC0904A]

Downloads last month: 322,194

GGUF

Model size

1B params

Architecture

vltx

Hardware compatibility

4-bit

8-bit

View +1 variant