A newer version of this model is available: brandonbaek/Bori-2-135M-Base

๐ŸŒพ Bori-1 0.6B Base (Checkpoint 1000)

๐Ÿš€ Newer Version Available: The Bori project has evolved! Please see Bori-2 135M Base for the completed Phase 2 pre-training pipeline, or check out the Bori GitHub repository for the latest Bori-3 developments.

Bori-1 is the very first experimental proof-of-concept for the Bori project, aimed at exploring the feasibility of training bilingual (Korean-English) Small Language Models (SLMs) under extreme compute constraints.

โš ๏ธ Status: This was a preliminary exploratory run paused early at Checkpoint 1000. It is published solely for historical tracking and to serve as a baseline for the architectural shifts made in Bori-2 and Bori-3.

๐Ÿค– Model Details

  • Base Architecture: Qwen2
  • Parameter Count: ~600M
  • Languages: Korean, English

๐Ÿ’ป Hardware & Compute

  • Hardware: Trained on Kaggle Notebooks using 2x NVIDIA T4 GPUs (16GB VRAM each).
  • Constraints: Navigating the memory constraints of a 600M parameter model on 16GB GPUs without advanced quantization required aggressive gradient accumulation and small batch sizes, leading to the decision to pivot to the highly efficient ~135M architecture for Bori-2 to allow for more robust pre-training experimentation.

โš ๏ธ Limitations & Intended Use

This model was paused very early in its training lifecycle. It is significantly undertrained and exhibits poor coherence. It should not be used for text generation, fine-tuning, or deployment. Its primary value is as a historical artifact demonstrating the early stages of the Bori project's development.

Downloads last month
48
Safetensors
Model size
0.7B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for brandonbaek/Bori-1-0.6B-Base

Quantizations
1 model