Instructions to use BhinekaIntiLabs/bhineka-gpt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BhinekaIntiLabs/bhineka-gpt with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BhinekaIntiLabs/bhineka-gpt") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("BhinekaIntiLabs/bhineka-gpt") model = AutoModelForCausalLM.from_pretrained("BhinekaIntiLabs/bhineka-gpt") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use BhinekaIntiLabs/bhineka-gpt with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BhinekaIntiLabs/bhineka-gpt", filename="bhineka-gpt.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use BhinekaIntiLabs/bhineka-gpt with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BhinekaIntiLabs/bhineka-gpt # Run inference directly in the terminal: llama-cli -hf BhinekaIntiLabs/bhineka-gpt
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BhinekaIntiLabs/bhineka-gpt # Run inference directly in the terminal: llama-cli -hf BhinekaIntiLabs/bhineka-gpt
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BhinekaIntiLabs/bhineka-gpt # Run inference directly in the terminal: ./llama-cli -hf BhinekaIntiLabs/bhineka-gpt
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BhinekaIntiLabs/bhineka-gpt # Run inference directly in the terminal: ./build/bin/llama-cli -hf BhinekaIntiLabs/bhineka-gpt
Use Docker
docker model run hf.co/BhinekaIntiLabs/bhineka-gpt
- LM Studio
- Jan
- vLLM
How to use BhinekaIntiLabs/bhineka-gpt with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BhinekaIntiLabs/bhineka-gpt" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BhinekaIntiLabs/bhineka-gpt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BhinekaIntiLabs/bhineka-gpt
- SGLang
How to use BhinekaIntiLabs/bhineka-gpt with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BhinekaIntiLabs/bhineka-gpt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BhinekaIntiLabs/bhineka-gpt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BhinekaIntiLabs/bhineka-gpt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BhinekaIntiLabs/bhineka-gpt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use BhinekaIntiLabs/bhineka-gpt with Ollama:
ollama run hf.co/BhinekaIntiLabs/bhineka-gpt
- Unsloth Studio
How to use BhinekaIntiLabs/bhineka-gpt with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BhinekaIntiLabs/bhineka-gpt to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BhinekaIntiLabs/bhineka-gpt to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BhinekaIntiLabs/bhineka-gpt to start chatting
- Docker Model Runner
How to use BhinekaIntiLabs/bhineka-gpt with Docker Model Runner:
docker model run hf.co/BhinekaIntiLabs/bhineka-gpt
- Lemonade
How to use BhinekaIntiLabs/bhineka-gpt with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BhinekaIntiLabs/bhineka-gpt
Run and chat with the model
lemonade run user.bhineka-gpt-{{QUANT_TAG}}List all available models
lemonade list
Bhineka-GPT-500M
Bhineka-GPT-500M is a bilingual Indonesian-English causal language model built with a Llama-style decoder-only architecture. The final pretraining checkpoint contains 556.3M trainable parameters and was validated on 53.7M held-out tokens across English web, Indonesian web, code, and math domains, reaching an overall validation loss of 2.5355 and perplexity of 12.62.
The project is designed as an end-to-end training pipeline, covering dataset download, text cleaning, language filtering, deduplication, tokenizer training, shard building, curriculum sampling, pretraining, supervised fine-tuning, direct preference optimization, and final Hugging Face export.
The model is intended for Indonesian and English text generation tasks such as question answering, summarization, rewriting, translation, technical drafting, code assistance, document understanding, and structured Markdown or JSON-style responses.
Model Details
- Model type: decoder-only causal language model
- Architecture: Llama-style Transformer
- Languages: Indonesian and English
- Parameters: 556,269,696 in the validated final pretraining checkpoint
- Context length: 2048 tokens
- Tokenizer: BPE tokenizer trained for this project
- Vocabulary size: 64,000 tokens in the current pipeline config
- Hidden size: 1152
- Layers: 28
- Attention heads: 16
- Key/value heads: 8, using grouped-query attention
- Feed-forward size: 3072, using SwiGLU-style activation
- Positional encoding: RoPE
- Normalization: RMSNorm
- Precision target: bfloat16
- Validation checkpoint:
checkpoints/pretrain/final
Training Pipeline
Bhineka-GPT-500M is produced through the following stages:
- Dataset download from public Hugging Face datasets
- Rule-based cleaning and quality filtering
- Exact and MinHash deduplication
- BPE tokenizer training
- Binary shard creation
- Domain-weighted curriculum sampling
- Causal language model pretraining
- Supervised fine-tuning on instruction/chat datasets
- Direct Preference Optimization
- Export to Hugging Face format with safetensors
Training Data
The current project configuration targets a bilingual and technical mixture with approximately 12.5B total pretraining tokens:
| Domain | Approximate Target Tokens | Purpose |
|---|---|---|
| English high-quality web | 8.7B | General knowledge, reasoning, writing |
| Indonesian high-quality web | 2.15B | Indonesian language coverage and local text style |
| Code | 1.05B | Python, JavaScript, Go, SQL, and technical generation |
| Math / academic | 600M | Mathematical and academic text exposure |
Main pretraining sources include FineWeb, FineWeb-Edu, CulturaX Indonesian, mC4 Indonesian, Indonesian Wikipedia, GitHub code subsets, and OpenWebMath.
Instruction tuning data configured in the project includes Alpaca-style and chat-style datasets such as Alpaca Cleaned, Dolly 15k, Alpaca Indonesian, Alpaca GPT-4 Indonesian, OpenHermes 2.5, and SlimOrca.
Intended Uses
This model is intended for research, experimentation, and application prototyping in Indonesian-English language tasks, including:
- General chat and instruction following
- Indonesian and English question answering
- Indonesian-English translation
- Summarization and rewriting
- Technical explanation and drafting
- Python, JavaScript, Go, and SQL code assistance
- Markdown and structured response generation
Out-of-Scope Uses
This model should not be used as the sole source of truth for high-stakes decisions, including medical, legal, financial, safety-critical, or emergency contexts. It should also not be used to generate harmful instructions, impersonation, spam, fraud, or privacy-invasive content.
Limitations
- The model may hallucinate facts, citations, code behavior, or numerical details.
- Performance may vary across Indonesian dialects, informal registers, and domain-specific terminology.
- The model can reflect biases and quality issues present in public web, code, math, and instruction datasets.
- Smaller language models may struggle with long reasoning chains, complex tool use, and strict factuality.
- The reported validation-loss results cover language-modeling loss only; broader instruction-following, safety, factuality, and downstream task evaluations are still recommended before production use.
Evaluation
Validation loss was measured with scripts/run_validation_loss.py on the final pretraining checkpoint:
- Checkpoint:
checkpoints/pretrain/final - Evaluation date: 2026-05-31
- Device: CUDA
- Evaluation dtype: float32
- Context length: 2048 tokens
- Batch size: 4
- Tokens evaluated: 53,678,481
- Batches evaluated: 6,558
- Tokenizer vocabulary: 64,000
- Model vocabulary: 64,000
- Random-loss baseline: 11.0666
- Parameter check: 556,269,696 trainable parameters, no non-finite values reported
| Domain | Loss | Perplexity | Tokens | Batches |
|---|---|---|---|---|
| Overall | 2.5355 | 12.6227 | 53,678,481 | 6,558 |
| Code | 1.4304 | 4.1804 | 15,121,189 | 1,847 |
| English high-quality web | 3.1551 | 23.4543 | 20,635,807 | 2,521 |
| Indonesian high-quality web | 2.7256 | 15.2651 | 11,481,623 | 1,403 |
| Math | 2.8062 | 16.5462 | 6,439,862 | 787 |
Benchmark Comparison
The following benchmark table compares Bhineka-GPT with several small open-weight language models in the same approximate parameter range. These numbers should be read as an orientation benchmark rather than a perfectly fair leaderboard comparison, because evaluation harness settings, shot count, prompt format, checkpoint type, tokenizer, and instruction tuning status may differ across sources.
For Bhineka-GPT, ARC, HellaSwag, and WinoGrande were evaluated in 0-shot mode, while GSM8K used 5-shot evaluation.
| Model | Params | ARC | HellaSwag | WinoGrande | GSM8K | Notes |
|---|---|---|---|---|---|---|
| Bhineka-GPT | 556M | 24.83 | 31.58 | 48.86 | 1.90 | 0-shot except GSM8K 5-shot |
| Pythia-410M-deduped | ±410M / 0.5B | 27.90 | 40.04 | 52.09 | 0.00 | Open LLM Leaderboard-style evaluation, mostly few-shot [1] |
| Pythia-1B-deduped | 1B | 29.10 | 49.65 | 53.59 | 1.14 | Larger model, trained with substantially more compute and data [2] |
| TinyLlama-1.1B Chat | 1.1B | 36.09 | 61.10 | 61.25 | — | Pretrained on approximately 3T tokens; target training setup reported as 16×A100 for about 90 days [3] |
| TinyLlama 1.1B variant | 1.1B | 30.29 | 55.12 | 55.80 | 0.53 | Fine-tuned variant, Open LLM Leaderboard-style evaluation [4] |
| Qwen2-0.5B | ±0.5B non-embedding | 61.10 | 49.30 | 74.40 | 36.50 | Much more mature model family; not a fair direct comparison against a from-scratch sub-$100 training experiment [5] |
Interpretation:
- Bhineka-GPT is competitive enough to be a useful research baseline for a from-scratch 556M bilingual Indonesian-English model, especially considering its limited training budget.
- Larger or more mature models such as TinyLlama, Pythia-1B, and Qwen2-0.5B benefit from more training tokens, more mature infrastructure, and/or larger-scale optimization.
- The comparison is most useful for positioning Bhineka-GPT as a lightweight experimental bilingual model, not as a claim of state-of-the-art performance.
These results measure next-token prediction quality on validation data. Recommended additional evaluations before release include Indonesian and English instruction-following benchmarks, translation quality checks, summarization and factuality tests, code generation tests, and safety testing.
Usage
After export and upload, the model can be loaded with Transformers. Because this project defines a custom bhineka model architecture, the model repository may need to include the custom modeling files and be loaded with trust_remote_code=True.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "BhinekaIntiLabs/bhineka-gpt"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
prompt = "<|user|> Jelaskan apa itu deduplikasi data dalam pelatihan model bahasa.<|sep|><|assistant|>"
inputs = tokenizer(
prompt,
return_tensors="pt",
add_special_tokens=False,
).to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.1,
use_cache=False,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
completion = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(completion, skip_special_tokens=True))
The exporter saves the model in Hugging Face format with safetensors, tokenizer files, config files, and generation config.
License
This model card declares the apache-2.0 license in the Hugging Face metadata. Please ensure that all training data usage, code dependencies, and released model artifacts are compatible with this license before publishing.
Citation
If you use this model or pipeline, cite the project repository:
@software{bhineka_llm_500m,
title = {Bhineka-GPT-500M},
author = {Bhineka-GPT contributors},
year = {2026},
note = {Bilingual Indonesian-English language model training pipeline}
}
- Downloads last month
- 683