qwen3_glove_512

GloVe word embeddings trained on English Wikipedia where each "word" is a Qwen/Qwen3-Embedding-8B token id.

Training

Corpus: jsanzolac/ga_wikipedia (English Wikipedia dump 2023-11-01)
Tokenizer: Qwen/Qwen3-Embedding-8B (no special tokens)
Implementation: stanfordnlp/GloVe
Vector size: 512
Min vocab count: 1
Window size: 15
Iterations: 15
x_max: 10

Files

File	Purpose
`vectors.txt`	GloVe text format: `<qwen3_id> v1 v2 ... v512`
`vectors.bin`	Binary format (`-binary 2`)
`vocab.txt`	Qwen3 id and its corpus count
`token_id_to_string.json`	Mapping from Qwen3 id → decoded string

Quick start

import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

vec_path = hf_hub_download("jsanzolac/qwen3_glove_512", "vectors.txt")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-Embedding-8B")

embeddings = {}
with open(vec_path) as f:
    for line in f:
        parts = line.rstrip().split(" ")
        embeddings[int(parts[0])] = np.asarray(parts[1:], dtype=np.float32)

def embed(text):
    ids = tok.encode(text, add_special_tokens=False)
    return np.mean([embeddings[i] for i in ids if i in embeddings], axis=0)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsanzolac/qwen3_glove_512

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-Embedding-8B

Finetuned

(28)

this model

jsanzolac
/

qwen3_glove_512

qwen3_glove_512

Training

Files

Quick start

Model tree for jsanzolac/qwen3_glove_512

Dataset used to train jsanzolac/qwen3_glove_512