Edit model card

jamba-H1024_L12-v0.07-fineweb-1M-med

Open In Colab

mid-training checkpoint

  • arch: jamba (see model card for kernels/use)
  • tokenizer: claude3 as HF GPT2
  • has only seen up to 2048 context length thus far

numbers

for this checkpoint

hf (pretrained=pszemraj/jamba-H1024_L12-v0.07-fineweb-1M-med,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
winogrande 1 none 0 acc 0.4972 ± 0.0141
piqa 1 none 0 acc 0.6072 ± 0.0114
none 0 acc_norm 0.6034 ± 0.0114
openbookqa 1 none 0 acc 0.1660 ± 0.0167
none 0 acc_norm 0.2800 ± 0.0201
lambada_openai 1 none 0 perplexity 157.6757 ± 6.8536
none 0 acc 0.2127 ± 0.0057
boolq 2 none 0 acc 0.6235 ± 0.0085
arc_easy 1 none 0 acc 0.3944 ± 0.0100
none 0 acc_norm 0.3531 ± 0.0098
Downloads last month
636
Safetensors
Model size
888M params
Tensor type
F32
·
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train pszemraj/jamba-H1024_L12-v0.07-fineweb-1M-med