Edit model card

GeoV-9B is a 9 billion parameter causal language model.

The GeoV model was designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER) by Georges Harik and Varuna Jayasiri.

RoPER, in addition to using relative positions in the attention score calculation by RoPE embeddings, adds relative positional information explicitly to value embeddings. Specifically, it incorporates the relative positions of the tokens paid attention to. RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling.

Model details

  • Developed by: Georges Harik
  • Model type: Transformer-based Language Model
  • Language: English
Hyperparameter Value
nparameters 9B
nlayers 32
dmodel 5120
nheads 40
dhead 128
nvocab 65500
Sequence Length 2048

The released weights were trained on ~70 billion tokens. We plan to continue training up to 300 billion tokens and update the weights at every 20b tokens. This training run is monolingual and uses c4en and english wikipedia datasets.

Test results

These are the results from EleutherAI/lm-evaluation-harness at 80B (tokens trained) checkpoint.

Task Version Metric Value Stderr
anli_r1 0 acc 0.3150 ± 0.0147
anli_r2 0 acc 0.3380 ± 0.0150
anli_r3 0 acc 0.3367 ± 0.0136
hellaswag 0 acc 0.4761 ± 0.0050
acc_norm 0.6308 ± 0.0048
lambada_openai 0 ppl 8.9700 ± 0.2606
acc 0.5628 ± 0.0069
mathqa 0 acc 0.2318 ± 0.0077
acc_norm 0.2372 ± 0.0078
piqa 0 acc 0.7448 ± 0.0102
acc_norm 0.7639 ± 0.0099
winogrande 0 acc 0.5935 ± 0.0138
wsc 0 acc 0.4038 ± 0.0483

Installation

pip install geov

Generation

Open In Colab

from geov import GeoVForCausalLM, GeoVTokenizer

model = GeoVForCausalLM.from_pretrained("GeoV/GeoV-9b")
tokenizer = GeoVTokenizer.from_pretrained("GeoV/GeoV-9b")

prompt = "In mathematics, topology is the study of"

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

gen_tokens = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.9,
    max_length=100,
)
gen_text = tokenizer.batch_decode(gen_tokens)[0]
Downloads last month
40
Hosted inference API
This model can be loaded on the Inference API on-demand.

Space using GeoV/GeoV-9b 1