Instructions to use AhiskaAI/AhiskaAI-65m-IT-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AhiskaAI/AhiskaAI-65m-IT-v0.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AhiskaAI/AhiskaAI-65m-IT-v0.1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AhiskaAI/AhiskaAI-65m-IT-v0.1") model = AutoModelForCausalLM.from_pretrained("AhiskaAI/AhiskaAI-65m-IT-v0.1") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AhiskaAI/AhiskaAI-65m-IT-v0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AhiskaAI/AhiskaAI-65m-IT-v0.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AhiskaAI/AhiskaAI-65m-IT-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AhiskaAI/AhiskaAI-65m-IT-v0.1
- SGLang
How to use AhiskaAI/AhiskaAI-65m-IT-v0.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AhiskaAI/AhiskaAI-65m-IT-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AhiskaAI/AhiskaAI-65m-IT-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AhiskaAI/AhiskaAI-65m-IT-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AhiskaAI/AhiskaAI-65m-IT-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AhiskaAI/AhiskaAI-65m-IT-v0.1 with Docker Model Runner:
docker model run hf.co/AhiskaAI/AhiskaAI-65m-IT-v0.1
AhiskaAI 65m IT v0.1 (Instruction Tuned)
AhiskaAI 65m IT v0.1 is a highly efficient, custom-aligned Small Language Model (SLM) for the Turkish language ecosystem.
This model was NOT fine-tuned on top of generic open-source weights. Instead, it was instruction-tuned directly over our proprietary foundation model, AhıskaAI 65m Base v0.1 (which was pre-trained from scratch for 1 full epoch on a 5.3 GB Turkish corpus). For this alignment phase (SFT), we utilized a strictly filtered and curated Turkish Alpaca dataset to maximize procedural logic, formatting accuracy, and structural fluidity while eliminating noisy data tokens.
🧬 The Pipeline: From Scratch to Instruction
Our research lab follows a strict vertical integration philosophy:
- Phase 1 (Base Model): Initialized
LlamaForCausalLMfrom zero variables. Pre-trained on 5.3 GB of clean Turkish text matrix to lock down grammar, token-nesting patterns, and core semantics (AhıskaAI 65m Base v0.1). - Phase 2 (Instruction Tuning): Supervised Fine-Tuning (SFT) over the base checkpoint using our custom-filtered Alpaca instructions. This phase injected formatting discipline, listing mechanics (
1. 2. 3.), and multi-turn response compliance.
📊 Technical Architecture & Hyperparameters
Directly extracted from the native config.json, the model utilizes a pure modern LLaMA layout optimized for fast local compute:
- Architecture:
LlamaForCausalLM - Parameters: ~65 Million
- Context Length (
max_position_embeddings): 1024 tokens (Double the capacity of legacy GPT-2 baselines) - Vocabulary Size: 32,000 tokens (Custom BPE trained for Turkish root-suffix morphology)
- Hidden Dimension (
hidden_size): 512 - Intermediate Layer Dimension (
intermediate_size): 1376 - Hidden Layers (
num_hidden_layers): 12 - Attention Heads: 8 (
num_attention_heads/num_key_value_heads) - Activation Function: SiLU (
silu) - Normalization EPS:
rms_norm_eps: 1e-06(RMSNorm architecture) - Positional Embeddings: RoPE (
rope_type: default, theta: 10000.0) - Data Precision:
float32
💻 Hardware Efficiency & "Build in Public"
- Training & Alignment Hardware: NVIDIA GeForce RTX 4050 Laptop GPU (6GB VRAM)
- Inference Footprint: Merely ~202 MB in size! It runs at lightning-fast tokens-per-second even on Hugging Face Free CPU Spaces, bypassing the need for expensive cloud GPU hosting.
🛠️ Quickstart Usage (Alpaca Format)
To interact with the instruction-tuned layer smoothly, invoke the model with the exact token structure it was aligned with:
from transformers import LlamaForCausalLM, AutoTokenizer
import torch
model_name = "AhiskaAI/AhiskaAI-65m-IT-v0.1"
# Load the custom-built architecture and vocabulary
model = LlamaForCausalLM.from_pretrained(model_name).to("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_name)
def ask_ahiska_it(instruction):
# Strict Alpaca Template
prompt = f"<|im_start|>user\n{user_input}<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=250,
do_sample=True,
top_k=40,
top_p=0.92,
temperature=0.55, # Low temp keeps the 65m nodes highly focused
repetition_penalty=1.18
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("### Response:\n")[-1].strip()
# Run a test inference
print(ask_ahiska_it("Sağlıklı yaşamak için 3 ipucu ver"))
- Downloads last month
- 2