clip-text-msmarco-inversion
A vec2text inversion (hypothesizer) model that reconstructs text from the
embeddings produced by the CLIP ViT-L/14 text encoder (the text encoder used by
Stable Diffusion v1.5, identical to openai/clip-vit-large-patch14).
It is the first-stage model: given a CLIP text embedding, it produces an initial
text hypothesis. Pair it with the corrector model
Afrostnova/clip-text-msmarco-corrector
to iteratively refine that hypothesis.
- Base architecture:
vec2textInversionModel(T5-based encoder–decoder) - Embedder:
CLIPTextModel—openai/clip-vit-large-patch14 - Training data: MS MARCO
- Embedding transform:
repeat(num_repeat_tokens=16)
⚠️ Requirements
This is not loadable with the upstream pip install vec2text package — upstream
does not support a CLIPTextModel embedder. You need the fork that adds CLIP support
(the one this model was trained with). With the wrong version, from_pretrained fails
because the CLIPTextModel embedder is unknown.
Usage
import vec2text
inv = vec2text.models.InversionModel.from_pretrained("Afrostnova/clip-text-msmarco-inversion")
cor = vec2text.models.CorrectorEncoderModel.from_pretrained("Afrostnova/clip-text-msmarco-corrector")
corrector = vec2text.load_corrector(inv, cor)
# `embeddings` = CLIP text-encoder last_hidden_state, pooled at the EOS position.
text = vec2text.invert_embeddings(
embeddings=embeddings, # (batch, hidden_dim) on the same device as the model
corrector=corrector,
num_steps=20,
sequence_beam_width=1,
)
The CLIP text encoder is loaded automatically when the inversion model is instantiated.
On a clean machine it is fetched from openai/clip-vit-large-patch14; override with the
CLIP_TEXT_ENCODER environment variable if you want a local copy.
- Downloads last month
- 28