File size: 3,274 Bytes
a269360
27d4650
 
 
 
 
a269360
 
27d4650
 
c170726
27d4650
8bc31d8
6213f0f
c170726
8bc31d8
6213f0f
8bc31d8
 
 
838b2f7
8bc31d8
 
27d4650
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8a84ce
27d4650
 
b8a84ce
 
 
8bc31d8
f439238
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c170726
 
 
 
 
 
8bc31d8
 
c170726
8bc31d8
 
c170726
8bc31d8
c170726
 
8bc31d8
c170726
8bc31d8
c170726
8bc31d8
c170726
 
 
 
 
 
 
8bc31d8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
language:
- en
tags:
- pytorch
- causal-lm
license: bigscience-openrail-m
---


[GeoV](https://github.com/geov-ai/geov)-9B is a 9 billion parameter causal language model.

The GeoV model was designed by Georges Harik and uses 
[Rotary Positional Embeddings with Relative distances (RoPER)](https://research.labml.ai/RoPER.html)
by [Georges Harik](https://twitter.com/gharik) and [Varuna Jayasiri](https://twitter.com/vpj).

[RoPER](https://research.labml.ai/RoPER.html), 
in addition to using relative positions in the attention score calculation by RoPE embeddings,
adds relative positional information explicitly to value embeddings.
Specifically, it incorporates the relative positions of the tokens paid attention to.
RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling.

## Model details

- Developed by: [Georges Harik](http://twitter.com/gharik)
- Model type: Transformer-based Language Model
- Language: English

<figure style="width:30em">

| Hyperparameter         | Value       |
| ---------------------- | ----------- |
| n<sub>parameters</sub> | 9B          |
| n<sub>layers</sub>     | 32          |
| d<sub>model</sub>      | 5120        |
| n<sub>heads</sub>      | 40          |
| d<sub>head</sub>       | 128         |
| n<sub>vocab</sub>      | 65500       |
| Sequence Length        | 2048        |
</figure>

The released weights were trained on ~70 billion tokens.
We plan to continue training up to 300 billion tokens and update the weights at every 20b tokens.
This training run is monolingual and uses c4en and english wikipedia datasets.

## Test results

These are the results from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) at 80B (tokens trained) checkpoint.

|     Task     |Version| Metric |Value |   |Stderr|
|--------------|------:|--------|-----:|---|-----:|
|anli_r1       |      0|acc     |0.3150|±  |0.0147|
|anli_r2       |      0|acc     |0.3380|±  |0.0150|
|anli_r3       |      0|acc     |0.3367|±  |0.0136|
|hellaswag     |      0|acc     |0.4761|±  |0.0050|
|              |       |acc_norm|0.6308|±  |0.0048|
|lambada_openai|      0|ppl     |8.9700|±  |0.2606|
|              |       |acc     |0.5628|±  |0.0069|
|mathqa        |      0|acc     |0.2318|±  |0.0077|
|              |       |acc_norm|0.2372|±  |0.0078|
|piqa          |      0|acc     |0.7448|±  |0.0102|
|              |       |acc_norm|0.7639|±  |0.0099|
|winogrande    |      0|acc     |0.5935|±  |0.0138|
|wsc           |      0|acc     |0.4038|±  |0.0483|


## Installation

```shell
pip install geov
```

## Generation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/geov-ai/geov/blob/master/notebooks/generate.ipynb)

```python
from geov import GeoVForCausalLM, GeoVTokenizer

model = GeoVForCausalLM.from_pretrained("GeoV/GeoV-9b")
tokenizer = GeoVTokenizer.from_pretrained("GeoV/GeoV-9b")

prompt = "In mathematics, topology is the study of"

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

gen_tokens = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.9,
    max_length=100,
)
gen_text = tokenizer.batch_decode(gen_tokens)[0]
```