Instructions to use BhinekaIntiLabs/bhineka-gpt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BhinekaIntiLabs/bhineka-gpt with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="BhinekaIntiLabs/bhineka-gpt")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("BhinekaIntiLabs/bhineka-gpt")
model = AutoModelForCausalLM.from_pretrained("BhinekaIntiLabs/bhineka-gpt")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use BhinekaIntiLabs/bhineka-gpt with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="BhinekaIntiLabs/bhineka-gpt",
	filename="bhineka-gpt.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use BhinekaIntiLabs/bhineka-gpt with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BhinekaIntiLabs/bhineka-gpt
# Run inference directly in the terminal:
llama-cli -hf BhinekaIntiLabs/bhineka-gpt

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BhinekaIntiLabs/bhineka-gpt
# Run inference directly in the terminal:
llama-cli -hf BhinekaIntiLabs/bhineka-gpt

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf BhinekaIntiLabs/bhineka-gpt
# Run inference directly in the terminal:
./llama-cli -hf BhinekaIntiLabs/bhineka-gpt

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf BhinekaIntiLabs/bhineka-gpt
# Run inference directly in the terminal:
./build/bin/llama-cli -hf BhinekaIntiLabs/bhineka-gpt

Use Docker

docker model run hf.co/BhinekaIntiLabs/bhineka-gpt

LM Studio
Jan

vLLM

How to use BhinekaIntiLabs/bhineka-gpt with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BhinekaIntiLabs/bhineka-gpt"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BhinekaIntiLabs/bhineka-gpt",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/BhinekaIntiLabs/bhineka-gpt

SGLang

How to use BhinekaIntiLabs/bhineka-gpt with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "BhinekaIntiLabs/bhineka-gpt" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BhinekaIntiLabs/bhineka-gpt",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "BhinekaIntiLabs/bhineka-gpt" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BhinekaIntiLabs/bhineka-gpt",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use BhinekaIntiLabs/bhineka-gpt with Ollama:
```
ollama run hf.co/BhinekaIntiLabs/bhineka-gpt
```

Unsloth Studio

How to use BhinekaIntiLabs/bhineka-gpt with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BhinekaIntiLabs/bhineka-gpt to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BhinekaIntiLabs/bhineka-gpt to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for BhinekaIntiLabs/bhineka-gpt to start chatting

Docker Model Runner
How to use BhinekaIntiLabs/bhineka-gpt with Docker Model Runner:
```
docker model run hf.co/BhinekaIntiLabs/bhineka-gpt
```

Lemonade

How to use BhinekaIntiLabs/bhineka-gpt with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull BhinekaIntiLabs/bhineka-gpt

Run and chat with the model

lemonade run user.bhineka-gpt-{{QUANT_TAG}}

List all available models

lemonade list

Bhineka-GPT-500M

Bhineka-GPT-500M is a bilingual Indonesian-English causal language model built with a Llama-style decoder-only architecture. The final pretraining checkpoint contains 556.3M trainable parameters and was validated on 53.7M held-out tokens across English web, Indonesian web, code, and math domains, reaching an overall validation loss of 2.5355 and perplexity of 12.62.

The project is designed as an end-to-end training pipeline, covering dataset download, text cleaning, language filtering, deduplication, tokenizer training, shard building, curriculum sampling, pretraining, supervised fine-tuning, direct preference optimization, and final Hugging Face export.

The model is intended for Indonesian and English text generation tasks such as question answering, summarization, rewriting, translation, technical drafting, code assistance, document understanding, and structured Markdown or JSON-style responses.

Model Details

Model type: decoder-only causal language model
Architecture: Llama-style Transformer
Languages: Indonesian and English
Parameters: 556,269,696 in the validated final pretraining checkpoint
Context length: 2048 tokens
Tokenizer: BPE tokenizer trained for this project
Vocabulary size: 64,000 tokens in the current pipeline config
Hidden size: 1152
Layers: 28
Attention heads: 16
Key/value heads: 8, using grouped-query attention
Feed-forward size: 3072, using SwiGLU-style activation
Positional encoding: RoPE
Normalization: RMSNorm
Precision target: bfloat16
Validation checkpoint: checkpoints/pretrain/final

Training Pipeline

Bhineka-GPT-500M is produced through the following stages:

Dataset download from public Hugging Face datasets
Rule-based cleaning and quality filtering
Exact and MinHash deduplication
BPE tokenizer training
Binary shard creation
Domain-weighted curriculum sampling
Causal language model pretraining
Supervised fine-tuning on instruction/chat datasets
Direct Preference Optimization
Export to Hugging Face format with safetensors

Training Data

The current project configuration targets a bilingual and technical mixture with approximately 12.5B total pretraining tokens:

Domain	Approximate Target Tokens	Purpose
English high-quality web	8.7B	General knowledge, reasoning, writing
Indonesian high-quality web	2.15B	Indonesian language coverage and local text style
Code	1.05B	Python, JavaScript, Go, SQL, and technical generation
Math / academic	600M	Mathematical and academic text exposure

Main pretraining sources include FineWeb, FineWeb-Edu, CulturaX Indonesian, mC4 Indonesian, Indonesian Wikipedia, GitHub code subsets, and OpenWebMath.

Instruction tuning data configured in the project includes Alpaca-style and chat-style datasets such as Alpaca Cleaned, Dolly 15k, Alpaca Indonesian, Alpaca GPT-4 Indonesian, OpenHermes 2.5, and SlimOrca.

Intended Uses

This model is intended for research, experimentation, and application prototyping in Indonesian-English language tasks, including:

General chat and instruction following
Indonesian and English question answering
Indonesian-English translation
Summarization and rewriting
Technical explanation and drafting
Python, JavaScript, Go, and SQL code assistance
Markdown and structured response generation

Out-of-Scope Uses

This model should not be used as the sole source of truth for high-stakes decisions, including medical, legal, financial, safety-critical, or emergency contexts. It should also not be used to generate harmful instructions, impersonation, spam, fraud, or privacy-invasive content.

Limitations

The model may hallucinate facts, citations, code behavior, or numerical details.
Performance may vary across Indonesian dialects, informal registers, and domain-specific terminology.
The model can reflect biases and quality issues present in public web, code, math, and instruction datasets.
Smaller language models may struggle with long reasoning chains, complex tool use, and strict factuality.
The reported validation-loss results cover language-modeling loss only; broader instruction-following, safety, factuality, and downstream task evaluations are still recommended before production use.

Evaluation

Validation loss was measured with scripts/run_validation_loss.py on the final pretraining checkpoint:

Checkpoint: checkpoints/pretrain/final
Evaluation date: 2026-05-31
Device: CUDA
Evaluation dtype: float32
Context length: 2048 tokens
Batch size: 4
Tokens evaluated: 53,678,481
Batches evaluated: 6,558
Tokenizer vocabulary: 64,000
Model vocabulary: 64,000
Random-loss baseline: 11.0666
Parameter check: 556,269,696 trainable parameters, no non-finite values reported

Domain	Loss	Perplexity	Tokens	Batches
Overall	2.5355	12.6227	53,678,481	6,558
Code	1.4304	4.1804	15,121,189	1,847
English high-quality web	3.1551	23.4543	20,635,807	2,521
Indonesian high-quality web	2.7256	15.2651	11,481,623	1,403
Math	2.8062	16.5462	6,439,862	787

Benchmark Comparison

The following benchmark table compares Bhineka-GPT with several small open-weight language models in the same approximate parameter range. These numbers should be read as an orientation benchmark rather than a perfectly fair leaderboard comparison, because evaluation harness settings, shot count, prompt format, checkpoint type, tokenizer, and instruction tuning status may differ across sources.

For Bhineka-GPT, ARC, HellaSwag, and WinoGrande were evaluated in 0-shot mode, while GSM8K used 5-shot evaluation.

Model	Params	ARC	HellaSwag	WinoGrande	GSM8K	Notes
Bhineka-GPT	556M	24.83	31.58	48.86	1.90	0-shot except GSM8K 5-shot
Pythia-410M-deduped	±410M / 0.5B	27.90	40.04	52.09	0.00	Open LLM Leaderboard-style evaluation, mostly few-shot [1]
Pythia-1B-deduped	1B	29.10	49.65	53.59	1.14	Larger model, trained with substantially more compute and data [2]
TinyLlama-1.1B Chat	1.1B	36.09	61.10	61.25	—	Pretrained on approximately 3T tokens; target training setup reported as 16×A100 for about 90 days [3]
TinyLlama 1.1B variant	1.1B	30.29	55.12	55.80	0.53	Fine-tuned variant, Open LLM Leaderboard-style evaluation [4]
Qwen2-0.5B	±0.5B non-embedding	61.10	49.30	74.40	36.50	Much more mature model family; not a fair direct comparison against a from-scratch sub-$100 training experiment [5]

Interpretation:

Bhineka-GPT is competitive enough to be a useful research baseline for a from-scratch 556M bilingual Indonesian-English model, especially considering its limited training budget.
Larger or more mature models such as TinyLlama, Pythia-1B, and Qwen2-0.5B benefit from more training tokens, more mature infrastructure, and/or larger-scale optimization.
The comparison is most useful for positioning Bhineka-GPT as a lightweight experimental bilingual model, not as a claim of state-of-the-art performance.

These results measure next-token prediction quality on validation data. Recommended additional evaluations before release include Indonesian and English instruction-following benchmarks, translation quality checks, summarization and factuality tests, code generation tests, and safety testing.

Usage

After export and upload, the model can be loaded with Transformers. Because this project defines a custom bhineka model architecture, the model repository may need to include the custom modeling files and be loaded with trust_remote_code=True.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "BhinekaIntiLabs/bhineka-gpt"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

prompt = "<|user|> Jelaskan apa itu deduplikasi data dalam pelatihan model bahasa.<|sep|><|assistant|>"
inputs = tokenizer(
    prompt,
    return_tensors="pt",
    add_special_tokens=False, 
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    repetition_penalty=1.1,
    use_cache=False,     
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
)

completion = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(completion, skip_special_tokens=True))

The exporter saves the model in Hugging Face format with safetensors, tokenizer files, config files, and generation config.

License

This model card declares the apache-2.0 license in the Hugging Face metadata. Please ensure that all training data usage, code dependencies, and released model artifacts are compatible with this license before publishing.

Citation

If you use this model or pipeline, cite the project repository:

@software{bhineka_llm_500m,
  title = {Bhineka-GPT-500M},
  author = {Bhineka-GPT contributors},
  year = {2026},
  note = {Bilingual Indonesian-English language model training pipeline}
}

Downloads last month: 683

Safetensors

Model size

0.6B params

Tensor type

BF16

BhinekaIntiLabs
/

bhineka-gpt