如何加载模型

by baby666 - opened Jul 17, 2023

Jul 17, 2023

由于在运行代码的时候下载模型容易出现问题，所以我先下载了模型到本地。
我首先将模型下载到本地，可是不能进行加载。
以下是我尝试使用的代码：
model = KeyedVectors.load_word2vec_format("zhwiki_20180420_100d.txt")
model = KeyedVectors.load_word2vec_format("zhwiki_20180420_100d.txt", binary=True)
EOFError: unexpected end of input; is count incorrect or file otherwise damaged?

model = Word2Vec.load("zhwiki_20180420_100d.txt")

并且我在KeyedVectors.load_word2vec_format中仔细查找了源代码，并未发现有说明如何加载本地模型。

baby666

Jul 17, 2023

补充：hf_hub_download函数返回的也是缓存后的字符串地址。为何我将本地模型地址传入却加载不了呢？

lbourdois

Word2vec org Jul 18, 2023

Hi,
It's normal that the line model = KeyedVectors.load_word2vec_format("zhwiki_20180420_100d.txt", binary=True) doesn't work. The binary=True argument can only be used for .bin files, whereas here it is a .txt file.
Gensim is not currently supported by HF which is why I'm downloading the file from the Hub and not from something local.
I should have a discussion with the HF teams about this later in the week (cc @osanseviero for visibility).
So I hope to have some news to share with you on this point in the coming days.

baby666

Jul 19, 2023

非常感谢解答，希望你和你的团队能早日实现模型的本地加载。

lbourdois

Word2vec org Jul 21, 2023

•

edited Dec 7, 2023

Some news.

The code provided in the model card works fine.
hf_hub_download(repo_id="Word2vec/wikipedia2vec_zhwiki_20180420_100d", filename="zhwiki_20180420_100d.txt") downloads the MODELfrom the Hub and then caches it (see .cache\huggingface\hub).
When the code is relaunched, the file is no longer downloaded from the Hub, as hf_hub_download fetches the previously downloaded file locally.
It should be noted, however, that this download takes a long time to complete (3 min on my side). I observe this long loading time for .txt files (.bin files load almost instantaneously).

In your case, it may be better to work with .bin files (available on https://wikipedia2vec.github.io/wikipedia2vec/pretrained/) and then load it with the Gensim's load_word2vec_format() function (not the Word2Vec.load that we indicated in your first message) with pointing the right path. In the worst case, you can use the https://github.com/wikipedia2vec/wikipedia2vec lib.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment