mosaicml/mpt-7b-instruct · How can I extract embeddings from this model?

May 11, 2023

I would like to extract embeddings from this model, what would be the process to do that? I know in encoder-decoder style model, I can simply look at the encoder output.

michael-newsrx-com

May 11, 2023

I also would like to know.

abhi-mosaic

May 11, 2023

•

edited May 11, 2023

I would recommend building an MPTModel, which is the base model that produces embeddings, and you can see the class here in LLM Foundry.

You can either fork the repo or pip install the package and import:
pip install llm-foundry
from llmfoundry.models import MPTModel

If you run forward on MPTModel you'll get an output of size [batch_size, seq_len, d_model]. Youll probably want to average the embeddings (or some other scheme) along the seq_len dimension to get an average embedding for the sentence.

Hope this helps!

sam-mosaic changed discussion status to closed May 19, 2023

jameshuntercarter

Jun 15, 2023

@abhi-mosaic Thanks for that insight. Is this basic embedding functionality something you expect to get added to the LLM Foundry package down the road?