Instructions to use Omarrran/ks-byte-lm-spacebyte-transformers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Omarrran/ks-byte-lm-spacebyte-transformers with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Omarrran/ks-byte-lm-spacebyte-transformers", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Omarrran/ks-byte-lm-spacebyte-transformers", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Omarrran/ks-byte-lm-spacebyte-transformers with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Omarrran/ks-byte-lm-spacebyte-transformers" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Omarrran/ks-byte-lm-spacebyte-transformers", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Omarrran/ks-byte-lm-spacebyte-transformers
- SGLang
How to use Omarrran/ks-byte-lm-spacebyte-transformers with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Omarrran/ks-byte-lm-spacebyte-transformers" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Omarrran/ks-byte-lm-spacebyte-transformers", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Omarrran/ks-byte-lm-spacebyte-transformers" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Omarrran/ks-byte-lm-spacebyte-transformers", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Omarrran/ks-byte-lm-spacebyte-transformers with Docker Model Runner:
docker model run hf.co/Omarrran/ks-byte-lm-spacebyte-transformers
Configuration Parsing Warning:In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string
ks_byte_lm SpaceByte — Transformers-compatible release
This repo is the easier-to-load Hugging Face Transformers-style package for the trained Kashmiri byte-level ks_byte_lm SpaceByte model.
The model is a custom SpaceByte-style byte-level Transformer causal LM. Because the architecture is custom, load it with trust_remote_code=True.
Recommended checkpoint: model.safetensors converted from the original best.pt checkpoint.
Quick install
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Installation
pip install torch transformers safetensors regex
Google Colab note
Some Colab images ship a mismatched torchvision package. This model does not
use vision at all, but recent transformers imports can still touch
torchvision and fail with:
RuntimeError: operator torchvision::nms does not exist
ModuleNotFoundError: Could not import module 'PreTrainedModel'
If you see that error, run this in a fresh Colab runtime before loading:
!pip uninstall -y torchvision torchaudio
!pip install -U transformers safetensors regex
Then restart the runtime and load the model again. Authentication is optional
for this public repo; HF_TOKEN warnings only affect rate limits.
Quick generation
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Omarrran/ks-byte-lm-spacebyte-transformers"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)
inputs = tokenizer("کشمیر", return_tensors="pt")
out = model.generate(
**inputs,
max_new_tokens=100,
do_sample=True,
temperature=0.8,
top_k=50,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Recommended generation helper
The repo also includes a small helper that uses the original byte-LM generation loop:
from generation_ksbyte import generate_text
print(generate_text(
"Omarrran/ks-byte-lm-spacebyte-transformers",
prompt="کشمیر",
max_new_tokens=200,
temperature=0.8,
top_k=50,
))
Local usage after cloning/downloading
git clone https://huggingface.co/Omarrran/ks-byte-lm-spacebyte-transformers
cd ks-byte-lm-spacebyte-transformers
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python - <<'PY'
from transformers import AutoModelForCausalLM, AutoTokenizer
path = "."
tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(path, trust_remote_code=True)
inputs = tok("کشمیر", return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperature=0.8, top_k=50)
print(tok.decode(out[0], skip_special_tokens=True))
PY
What changed from the original release?
Original release:
- custom project checkpoint:
checkpoints/best.pt - loaded with
ksbyte.generate - not directly loadable by
AutoModelForCausalLM
This release:
- root
config.json - root
model.safetensors - custom
configuration_ksbyte.py - custom
modeling_ksbyte.py - custom
tokenization_ksbyte.py - loadable with
AutoModelForCausalLM.from_pretrained(..., trust_remote_code=True) - loadable with
AutoTokenizer.from_pretrained(..., trust_remote_code=True)
Metrics
Validation/evaluation artifacts from the source run:
- Best validation BPB: 0.9593
- Final validation BPB: 0.9862
- Final validation cross entropy: 0.6836
- Validation next-byte top-1 accuracy with best checkpoint: 76.42%
- Training byte tokens: 45,362,173
- Validation byte tokens: 1,622,371
- Test byte tokens: 3,074,698
- Model parameters: 15,837,440
- Original training stopped at step 4,751 / 5,000 by early stopping
Note: 76.42% is byte-token top-1 accuracy, not word-level accuracy.
Architecture
- task: byte-level causal language modeling
- variant: SpaceByte
- vocab size: 259 = 256 byte values + BOS/EOS/PAD
- hidden size: 384
- layers: 2 local input + 6 global + 2 local output
- attention heads: 6
- KV heads: 2
- context length: 2048 byte tokens
- parameters: 15.84M
Caveats
- This is a custom architecture, so
trust_remote_code=Trueis required. - It is a byte-level LM; outputs are decoded from UTF-8 bytes.
- Generations can be semantically weak or incomplete; use human review before strong claims.
- This is not a built-in GPT-2/LLaMA/Mistral architecture, but it is Transformers-compatible via custom code.
- Downloads last month
- 671
Dataset used to train Omarrran/ks-byte-lm-spacebyte-transformers
Evaluation results
- Best validation bits-per-byte on Kashmiri pretraining corpusvalidation set self-reported0.959
- Final validation bits-per-byte on Kashmiri pretraining corpusvalidation set self-reported0.986
- Validation next-byte top-1 accuracy on Kashmiri pretraining corpusvalidation set self-reported0.764
- Final validation cross entropy on Kashmiri pretraining corpusvalidation set self-reported0.684