Chinese Word Embedding
-
tencent-ailab-embedding-zh-d200-v0.2.0.tar.gz,https://ai.tencent.com/ailab/nlp/en/download.html -
tencent-ailab-embedding-zh-d100-v0.2.0-s.zip -
light_Tencent_AILab_ChineseEmbedding.bin, from github - shibing624/text2vec - more: https://github.com/Embedding/Chinese-Word-Vectors
Tencent AI Lab Embedding Corpora
Embedding Datasets Download
This page provides for downloading the Tencent AI Lab Chinese and English Term Embedding Corpora.
Latest Version for Chinese
The lastest version is v0.2.0, which was released on Dec 24, 2021.
| Version | Dimension | Vocab. Size | Download Url | Description |
|---|---|---|---|---|
| v0.2.0 | 200 | Small (2,000,000) | Original size: 3.6G; tar.gz size: 1.5G | |
| Large (12,287,936) | Original size: 22GB; tar.gz size: 9.0G | |||
| 100 | Small (2,000,000) | Original size: 1.8G; tar.gz size: 763M | ||
| Large (12,287,936) | Original size: 12GB; tar.gz size: 4.7G |
Information of version v0.2.0:
- Release time: Dec 24, 2021
- Data (sentences and vocabulary) acquisition time: Mar, 2021
Main updates of this version:
- New vocabulary
- New sentences for training the embedding
- Slight improvement of the training algorithm
Latest Version for English
The lastest version is v0.1.0, which was released on Sep 15, 2022. The instruction of parsing phrases with URL encoding into their original forms can be found in Q4 in FAQ.
| Version | Dimension | Vocab. Size | Download Url | Description |
|---|---|---|---|---|
| v0.1.0 | 200 | Small (2,000,000) | Original size: 3.6G; tar.gz size: 1.5G | |
| Large (6,596,681) | Original size: 12GB; tar.gz size: 4.8G | |||
| 100 | Small (2,000,000) | Original size: 1.8G; tar.gz size: 763M | ||
| Large (6,596,681) | Original size: 6GB; tar.gz size: 2.5G |
Information of version v0.1.0:
- Release time: Sep 15, 2022
- Data (sentences and vocabulary) acquisition time: March, 2021
History version download
v0.1.0 (Chinese)
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support