GeoV/GeoV-9b-r2 · Hugging Face

GeoV-9B-r2 is a 9 billion parameter causal language model.

It is still being trained and has the same architecture as the GeoV-9b model, but the training data is sampled without replacement; (GeoV-9b models training data was sampled with replacement).

The GeoV model was designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER) by Georges Harik and Varuna Jayasiri.

RoPER, in addition to using relative positions in the attention score calculation by RoPE embeddings, adds relative positional information explicitly to value embeddings. Specifically, it incorporates the relative positions of the tokens paid attention to. RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling.

Model details

Developed by: Georges Harik
Model type: Transformer-based Language Model
Language: English

Hyperparameter	Value
n_parameters	9B
n_layers	32
d_model	5120
n_heads	40
d_head	128
n_vocab	65500
Sequence Length	2048

The current released weights were trained on ~39 billion tokens. We plan to continue training up to 300 billion tokens. This training run is monolingual and uses c4en and english wikipedia datasets.

Test results

These are the results from EleutherAI/lm-evaluation-harness at 81B (tokens trained) checkpoint.

Task	Version	Metric	Value		Stderr
anli_r1	0	acc	0.3260	±	0.0148
anli_r2	0	acc	0.3380	±	0.0150
anli_r3	0	acc	0.3583	±	0.0138
hellaswag	0	acc	0.4666	±	0.0050
		acc_norm	0.6157	±	0.0049
lambada_openai	0	ppl	10.0153	±	0.3145
		acc	0.5403	±	0.0069
mathqa	0	acc	0.2332	±	0.0077
		acc_norm	0.2348	±	0.0078
piqa	0	acc	0.7503	±	0.0101
		acc_norm	0.7503	±	0.0101
winogrande	0	acc	0.5872	±	0.0138
wsc	0	acc	0.5673	±	0.0488

Installation

pip install geov

Generation

from geov import GeoVForCausalLM, GeoVTokenizer

model = GeoVForCausalLM.from_pretrained("GeoV/GeoV-9b-r2")
tokenizer = GeoVTokenizer.from_pretrained("GeoV/GeoV-9b-r2")

prompt = "In mathematics, topology is the study of"

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

gen_tokens = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.9,
    max_length=100,
)
gen_text = tokenizer.batch_decode(gen_tokens)[0]