res2vec OpenWebText-1B Word Embeddings
|
|
| Dimensions |
188 |
| Vocabulary |
1,008,133 words |
| Embedding Size |
723 MB |
| Total Size |
~744 MB |
| Format |
NumPy (.npy) |
Model Properties
| Metric |
Value |
| Information Capacity (I) |
0.106 |
| Average Resonance (R) |
0.071 |
| Optimal Dimension (d*) |
187.6 |
| Fixed Point |
โ |
Benchmarks
| Model |
Dim |
Corpus |
SimLex-999 |
WordSim-353 |
| word2vec Skip-gram |
300 |
Wikipedia 1B |
0.37 |
0.63 |
| GloVe |
300 |
Wikipedia+Gigaword 6B |
0.42 |
0.66 |
| res2vec |
188 |
OpenWebText 1B |
0.296 |
0.597 |
Example Similarities (cosine)
| Pair |
Similarity |
| king โ queen |
0.693 |
| man โ woman |
0.798 |
| cat โ dog |
0.854 |
| good โ bad |
0.855 |
Usage
import numpy as np
import json
embeddings = np.load("res2vec_owt1b_188d_embeddings.npy")
with open("res2vec_owt1b_188d_vocab.json") as f:
vocab = json.load(f)
Files
| File |
Description |
res2vec_owt1b_188d_embeddings.npy |
Embedding matrix, shape (1008133, 188), float32 |
res2vec_owt1b_188d_vocab.json |
Word-to-index mapping, {"word": index, ...} |
res2vec_owt1b_188d_metric.npy |
Learned metric tensor, shape (188, 188), float64 |
res2vec_owt1b_188d_results.json |
Training metadata and diagnostics |
res2vec_owt1b_188d_evaluation.json |
Benchmark scores and word pair similarities |
This is an initial proof-of-concept/demonstration of the Resonance Theory technique. Will be adding more versions and refining as I go. Please let me know what you think! :-)
Links