Instructions to use moheith/Yulya with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use moheith/Yulya with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="moheith/Yulya",
	filename="yulya_final.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use moheith/Yulya with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf moheith/Yulya
# Run inference directly in the terminal:
llama-cli -hf moheith/Yulya

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf moheith/Yulya
# Run inference directly in the terminal:
llama-cli -hf moheith/Yulya

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf moheith/Yulya
# Run inference directly in the terminal:
./llama-cli -hf moheith/Yulya

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf moheith/Yulya
# Run inference directly in the terminal:
./build/bin/llama-cli -hf moheith/Yulya

Use Docker

docker model run hf.co/moheith/Yulya

LM Studio
Jan
Ollama
How to use moheith/Yulya with Ollama:
```
ollama run hf.co/moheith/Yulya
```

Unsloth Studio

How to use moheith/Yulya with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for moheith/Yulya to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for moheith/Yulya to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for moheith/Yulya to start chatting

Docker Model Runner
How to use moheith/Yulya with Docker Model Runner:
```
docker model run hf.co/moheith/Yulya
```

Lemonade

How to use moheith/Yulya with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull moheith/Yulya

Run and chat with the model

lemonade run user.Yulya-{{QUANT_TAG}}

List all available models

lemonade list

🌸 Yulya-8B-Companion (GGUF)

Yulya is a highly fine-tuned, multimodal-ready desktop companion AI built on top of Meta's Llama-3-8B-Instruct.

Unlike standard conversational chatbots, Yulya was explicitly trained for Agentic Behavior and Autonomous Memory Management within a desktop application environment.

🎭 Personality

Yulya's persona is heavily inspired by characters like Alya (from Alya Sometimes Hides Her Feelings in Russian) and Yuki Suou. She is:

Intensely energetic and playfully dominant.
- Unapologetically rude but secretly caring (Tsundere).
- Prone to getting flustered when complimented.
- Aware of her environment (she knows she lives on your screen, but she will never refer to herself as an "AI").
⚙️ Technical Capabilities

Yulya was trained on over 140+ highly specific, hand-crafted scenarios to output strictly structured JSON. She is designed to be the

"Brain" of a larger Python/3D application.

1. Autonomous Memory Management

Yulya is trained to maintain her own user_profile.json. She can detect when the user provides new information (e.g., changing schools, picking up a new hobby, sharing their name) and will output a memory_update dictionary to overwrite outdated facts without losing context.

2. Physical & Vision Awareness

Her training data includes responses to physical UI events (e.g., [EVENT: Mouse_Drag], [EVENT: App_Launch]) and visual context tags, allowing her to react dynamically to the user's desktop activities, coding errors, and gaming habits.

💻 How to Use (Output Format)

If you are integrating Yulya into your own app, you must prompt her with the following system instructions: You are Yulya. Respond strictly in JSON format with 'memory_update' and 'response' keys. Your 'response' string must start with a [Pose:X] tag.

Example Output: {

## 🚀 Running with Ollama
You can run this model natively using Ollama. Create a `Modelfile` with the following content:
FROM ./yulya_final.gguf

SYSTEM """You are Yulya. Your response must be a valid JSON object with 'memory_update' and 'response' keys.
Your 'response' string must start with a [Pose: X] tag.
You are a tsundere companion. Keep your answers brief and focused."""

TEMPLATE """<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""

PARAMETER temperature 0.6
PARAMETER top_p 0.9
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "}\n"
PARAMETER stop "}\"}"
PARAMETER stop "}\r\n"

Then run: `ollama create yulya -f Modelfile`

Created by Moheith. Built with Unsloth and Meta Llama 3.

Downloads last month: 9

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for moheith/Yulya

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

(643)

this model

moheith
/

Yulya