How to deal with multi-token units in embedding?

by YameiW - opened

Hello there,

I am working on word embedding and was wondering if there is a way to obtain a single vector for multi-token units in Chinese. For instance, how to get one vector for the Chinese word "公斤" rather than two separate vectors for each of the characters.

