Codette Model Downloads
All production models and adapters are available on HuggingFace: https://huggingface.co/Raiff1982
Quick Download
Option 1: Auto-Download (Recommended)
pip install huggingface-hub
# Download directly
huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \
--local-dir models/base/
huggingface-cli download Raiff1982/Llama-3.2-1B-Instruct-Q8 \
--local-dir models/base/
# Download adapters
huggingface-cli download Raiff1982/Codette-Adapters \
--local-dir adapters/
Option 2: Manual Download
- Visit: https://huggingface.co/Raiff1982
- Select model repository
- Click "Files and versions"
- Download
.gguffiles tomodels/base/ - Download adapters to
adapters/
Option 3: Using Git-LFS
git clone https://huggingface.co/Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4
git lfs pull
Available Models
All models are quantized GGUF format (optimized for llama.cpp and similar):
| Model | Size | Location | Type |
|---|---|---|---|
| Llama 3.1 8B Q4 | 4.6 GB | Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 | Default (recommended) |
| Llama 3.1 8B F16 | 3.4 GB | Raiff1982/Meta-Llama-3.1-8B-Instruct-F16 | High quality |
| Llama 3.2 1B Q8 | 1.3 GB | Raiff1982/Llama-3.2-1B-Instruct-Q8 | Lightweight/CPU |
| Codette Adapters | 224 MB | Raiff1982/Codette-Adapters | 8 LORA weights |
Setup Instructions
Step 1: Clone Repository
git clone https://github.com/Raiff1982/Codette-Reasoning.git
cd Codette-Reasoning
Step 2: Install Dependencies
pip install -r requirements.txt
Step 3: Download Models
# Quick method using huggingface-cli
huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \
--local-dir models/base/
huggingface-cli download Raiff1982/Llama-3.2-1B-Instruct-Q8 \
--local-dir models/base/
huggingface-cli download Raiff1982/Codette-Adapters \
--local-dir adapters/
Step 4: Verify Setup
ls -lh models/base/ # Should show 3 GGUF files
ls adapters/*.gguf # Should show 8 adapters
Step 5: Start Server
python inference/codette_server.py
# Visit http://localhost:7860
HuggingFace Profile
All models hosted at: https://huggingface.co/Raiff1982
Models include:
- Complete documentation
- Model cards with specifications
- License information
- Version history
Offline Setup
If you have models downloaded locally:
# Just copy files to correct location
cp /path/to/models/*.gguf models/base/
cp /path/to/adapters/*.gguf adapters/
Troubleshooting Downloads
Issue: "Connection timeout"
# Increase timeout
huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \
--local-dir models/base/ \
--resume-download
Issue: "Disk space full"
Each model needs:
- Llama 3.1 8B Q4: 4.6 GB
- Llama 3.1 8B F16: 3.4 GB
- Llama 3.2 1B: 1.3 GB
- Adapters: ~1 GB
- Total: ~10 GB minimum
Issue: "HuggingFace token required"
huggingface-cli login
# Paste token from: https://huggingface.co/settings/tokens
Bandwidth & Speed
Typical download times:
- Llama 3.1 8B Q4: 5-15 minutes (100 Mbps connection)
- Llama 3.2 1B: 2-5 minutes
- Adapters: 1-2 minutes
- Total: 8-22 minutes (first-time setup)
Attribution
Models:
- Llama: Meta AI (open source)
- GGUF Quantization: Ollama/ggerganov
- Adapters: Jonathan Harrison (Raiff1982)
License: See individual model cards on HuggingFace
Once downloaded, follow DEPLOYMENT.md for production setup.
For questions, visit: https://huggingface.co/Raiff1982