Instructions to use poomphasindev/auan-llm-928m-base-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use poomphasindev/auan-llm-928m-base-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="poomphasindev/auan-llm-928m-base-preview")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("poomphasindev/auan-llm-928m-base-preview")
model = AutoModelForCausalLM.from_pretrained("poomphasindev/auan-llm-928m-base-preview")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use poomphasindev/auan-llm-928m-base-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "poomphasindev/auan-llm-928m-base-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "poomphasindev/auan-llm-928m-base-preview",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/poomphasindev/auan-llm-928m-base-preview

SGLang

How to use poomphasindev/auan-llm-928m-base-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "poomphasindev/auan-llm-928m-base-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "poomphasindev/auan-llm-928m-base-preview",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "poomphasindev/auan-llm-928m-base-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "poomphasindev/auan-llm-928m-base-preview",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use poomphasindev/auan-llm-928m-base-preview with Docker Model Runner:
```
docker model run hf.co/poomphasindev/auan-llm-928m-base-preview
```

Thai LLM 928M Base Preview

This is a Thai-focused Qwen2-style causal language model trained from scratch.

It is a base model, not an instruction-tuned/chat model. It is intended for continued pretraining, evaluation, research, and downstream supervised fine-tuning.

Model Details

Architecture: Qwen2ForCausalLM
Parameters: ~928M
Initialization: random initialization, trained from scratch
Vocabulary: 32,000-token Thai byte-level BPE tokenizer
Context length: 2,048 tokens
Hidden size: 2,048
Layers: 18
Attention heads: 16
KV heads: 4
Intermediate size: 5,504

Training Snapshot

Checkpoint: best validation checkpoint
Step: 5,500
Validation loss: 2.51767520904541
Training objective: next-token prediction
Precision during training: bf16

Training Data

The training configuration included Thai-focused corpora such as:

SPAISS6F1/slm-pretrain-corpus
SPAISS6F1/Finance
pythainlp/thai-wiki-dataset-v3
pythainlp/thailaw-v1.0
pythainlp/thai-constitution-corpus
pythainlp/thai-financial-dataset

Please audit dataset licenses and suitability before commercial use.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "YOUR_USERNAME/thai-llm-928m-base-preview"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16)
model.eval()

prompt = "ประเทศไทยมี"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=80,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Limitations

This is an early base checkpoint, not a chat assistant.
Generation quality may be unstable or repetitive.
The model has not been aligned for safety.
The model may hallucinate or produce inappropriate text.
It should not be used for high-stakes decisions without further evaluation and alignment.

Downloads last month: -

Safetensors

Model size

0.9B params

Tensor type

F32