Instructions to use Inserloft/NaNo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Inserloft/NaNo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Inserloft/NaNo")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Inserloft/NaNo", dtype="auto")

llama-cpp-python

How to use Inserloft/NaNo with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Inserloft/NaNo",
	filename="NaNo-V3.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Inserloft/NaNo with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Inserloft/NaNo
# Run inference directly in the terminal:
llama-cli -hf Inserloft/NaNo

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Inserloft/NaNo
# Run inference directly in the terminal:
llama-cli -hf Inserloft/NaNo

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Inserloft/NaNo
# Run inference directly in the terminal:
./llama-cli -hf Inserloft/NaNo

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Inserloft/NaNo
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Inserloft/NaNo

Use Docker

docker model run hf.co/Inserloft/NaNo

LM Studio
Jan

vLLM

How to use Inserloft/NaNo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Inserloft/NaNo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Inserloft/NaNo",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Inserloft/NaNo

SGLang

How to use Inserloft/NaNo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Inserloft/NaNo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Inserloft/NaNo",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Inserloft/NaNo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Inserloft/NaNo",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use Inserloft/NaNo with Ollama:
```
ollama run hf.co/Inserloft/NaNo
```

Unsloth Studio new

How to use Inserloft/NaNo with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Inserloft/NaNo to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Inserloft/NaNo to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Inserloft/NaNo to start chatting

Docker Model Runner
How to use Inserloft/NaNo with Docker Model Runner:
```
docker model run hf.co/Inserloft/NaNo
```

Lemonade

How to use Inserloft/NaNo with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Inserloft/NaNo

Run and chat with the model

lemonade run user.NaNo-{{QUANT_TAG}}

List all available models

lemonade list

Inserloft commited on about 20 hours ago

Commit

8d59df9

verified ·

1 Parent(s): 714219d

Update README.md

Browse files

Files changed (1) hide show

README.md +167 -21

README.md CHANGED Viewed

@@ -1,36 +1,182 @@
 ---
 language:
-- es
 - en
 license: mit
 tags:
-- gpt2
 - code
-- bilingual
 - inserloft
-model_name: Cleo Nano v3.1 Bilingual
 ---
-# Cleo Nano v3.1 (Bilingual Optimization)
-Cleo Nano is a decoder-only Transformer model developed by **Inserloft** under the vision of **Jesus Heriberto Corona**. This version (v3.1) features surgical fine-tuning for bilingual stability (English/Spanish) and hallucination control.
-## Model Details
-- **Architecture:** Decoder-Only GPT (Custom)
-- **Layers:** 8
-- **Embedding Dim:** 384
-- **Attention Heads:** 12
-- **Context Window:** 256 tokens
-- **Parameters:** ~15M
-- **Training Data:** Mix of Wikipedia, Python Code (CodeFeedback), and Identity Anchoring.
-## Usage
-To use this model, you need the custom `CleoNanoV3` architecture defined in PyTorch. The weights can be loaded using `torch.load()` or via the Hugging Face `from_pretrained` if using the provided mapping logic.
-### Capabilities
-1. **Bilingual Chat:** Responds to general queries in both Spanish and English.
-2. **Code Generation:** Specialized in Python snippets (Sum, Loops, Classes).
-3. **Identity Preservation:** Strong grounding on its origin and creator.
 ---
-Developed by [Inserloft](https://inserloft.dev/)

 ---
 language:
 - en
 license: mit
 tags:
+- ai
+- llm
+- edge-ai
+- mobile-ai
 - code
+- programming
+- lightweight
 - inserloft
+- nano
+pipeline_tag: text-generation
+library_name: transformers
+model_name: NaNo 3.1
 ---
+# NaNo 3.1
+NaNo 3.1 is a lightweight AI language model developed by Inserloft, designed primarily for programming, edge AI, mobile inference, and efficient local deployment.
+Unlike large-scale general-purpose models, NaNo focuses on delivering strong technical and coding-oriented capabilities while maintaining low resource consumption and fast inference speeds.
+NaNo is part of the broader Inserloft AI ecosystem alongside larger and more advanced models such as Kyro.
+---
+# Overview
+NaNo was built around a simple philosophy:
+> Efficient AI models should be capable, fast, lightweight, and deployable almost anywhere.
+NaNo 3.1 introduces major improvements in:
+- Context handling
+- Technical reasoning
+- Programming capabilities
+- Conversational stability
+- Inference optimization
+- Deployment efficiency
+This version also represents the largest scaling upgrade in the model family so far.
+---
+# What's New in NaNo 3.1
+## Major Parameter Scaling
+NaNo 3.1 scales from:
+- **22M → 52M parameters**
+This significant increase improves:
+- Code understanding
+- Response coherence
+- Technical reasoning
+- Long-context retention
+- Structured generation quality
+while preserving NaNo's lightweight deployment philosophy.
+---
+# Core Focus Areas
+## Programming
+NaNo is heavily optimized for:
+- Code generation
+- Function completion
+- Technical assistance
+- Refactoring
+- Automation workflows
+- Structured programming tasks
+---
+## Edge AI
+NaNo is designed for modern edge computing environments:
+- Lightweight servers
+- Embedded systems
+- Local AI applications
+- Edge devices
+- Efficient hardware deployment
+---
+## Mobile AI
+NaNo prioritizes:
+- Fast inference
+- Lower memory usage
+- Mobile compatibility
+- On-device execution
+- Offline AI experiences
+---
+# Model Details
+| Category | Value |
+|---|---|
+| Architecture | Decoder-Only Transformer |
+| Model Family | NaNo |
+| Version | 3.1 |
+| Parameters | ~52M |
+| Primary Focus | Programming & Edge AI |
+| Deployment Target | Mobile, Local, Edge |
+| License | MIT |
 ---
+# Technical Improvements
+NaNo 3.1 includes improvements across:
+- Attention stability
+- Context retention
+- Technical instruction following
+- Code consistency
+- Generation quality
+- Inference optimization
+The model is specifically optimized for technical and programming-oriented workflows rather than broad educational or general-purpose assistant behavior.
+---
+# Inserloft AI Ecosystem
+NaNo is part of the AI ecosystem developed by Inserloft.
+Current model ecosystem:
+- **NaNo** → Lightweight programming and edge AI
+- **Kyro** → Advanced large-scale reasoning and intelligence
+This specialization allows each model family to focus on specific real-world use cases.
+---
+# Intended Use Cases
+NaNo is intended for:
+- Coding assistants
+- Local AI tools
+- Mobile AI systems
+- Edge AI applications
+- Lightweight inference environments
+- Embedded AI workflows
+---
+# Future Development
+Future NaNo versions are expected to include:
+- Longer context windows
+- Better multilingual support
+- Improved reasoning
+- Faster inference
+- Better code generation
+- Mobile-specific optimizations
+- More efficient architectures
+---
+# Disclaimer
+NaNo is an actively evolving experimental AI model.
+Outputs may still contain inaccuracies, hallucinations, or unstable generations depending on prompts, deployment environments, and inference configurations.
+---
+# Links
+- Website: https://inserloft.dev
+- Hugging Face Organization: https://huggingface.co/Inserloft
+---
+Developed by Inserloft.