Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

How can I extract embeddings from this model?

#16
by MehtabPathan - opened

I would like to extract embeddings from this model, what would be the process to do that? I know in encoder-decoder style model, I can simply look at the encoder output.

I also would like to know.

Mosaic ML, Inc. org
β€’
edited May 11, 2023

I would recommend building an MPTModel, which is the base model that produces embeddings, and you can see the class here in LLM Foundry.

You can either fork the repo or pip install the package and import:
pip install llm-foundry
from llmfoundry.models import MPTModel

If you run forward on MPTModel you'll get an output of size [batch_size, seq_len, d_model]. Youll probably want to average the embeddings (or some other scheme) along the seq_len dimension to get an average embedding for the sentence.

Hope this helps!

sam-mosaic changed discussion status to closed

@abhi-mosaic Thanks for that insight. Is this basic embedding functionality something you expect to get added to the LLM Foundry package down the road?

Sign up or log in to comment