Chinese Word Embedding


Tencent AI Lab Embedding Corpora

Embedding Datasets Download

This page provides for downloading the Tencent AI Lab Chinese and English Term Embedding Corpora.

Latest Version for Chinese

The lastest version is v0.2.0, which was released on Dec 24, 2021.

Version Dimension Vocab. Size Download Url Description
v0.2.0 200 Small (2,000,000) tencent-ailab-embedding-zh-d200-v0.2.0-s.tar.gz Original size: 3.6G; tar.gz size: 1.5G
Large (12,287,936) tencent-ailab-embedding-zh-d200-v0.2.0.tar.gz Original size: 22GB; tar.gz size: 9.0G
100 Small (2,000,000) tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz Original size: 1.8G; tar.gz size: 763M
Large (12,287,936) tencent-ailab-embedding-zh-d100-v0.2.0.tar.gz Original size: 12GB; tar.gz size: 4.7G

Information of version v0.2.0:

  • Release time: Dec 24, 2021
  • Data (sentences and vocabulary) acquisition time: Mar, 2021

Main updates of this version:

  • New vocabulary
  • New sentences for training the embedding
  • Slight improvement of the training algorithm

Latest Version for English

The lastest version is v0.1.0, which was released on Sep 15, 2022. The instruction of parsing phrases with URL encoding into their original forms can be found in Q4 in FAQ.

Version Dimension Vocab. Size Download Url Description
v0.1.0 200 Small (2,000,000) tencent-ailab-embedding-en-d200-v0.1.0-s.tar.gz Original size: 3.6G; tar.gz size: 1.5G
Large (6,596,681) tencent-ailab-embedding-en-d200-v0.1.0.tar.gz Original size: 12GB; tar.gz size: 4.8G
100 Small (2,000,000) tencent-ailab-embedding-en-d100-v0.1.0-s.tar.gz Original size: 1.8G; tar.gz size: 763M
Large (6,596,681) tencent-ailab-embedding-en-d100-v0.1.0.tar.gz Original size: 6GB; tar.gz size: 2.5G

Information of version v0.1.0:

  • Release time: Sep 15, 2022
  • Data (sentences and vocabulary) acquisition time: March, 2021

History version download

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support