如何加载模型
由于在运行代码的时候下载模型容易出现问题,所以我先下载了模型到本地。
我首先将模型下载到本地,可是不能进行加载。
以下是我尝试使用的代码:
model = KeyedVectors.load_word2vec_format("zhwiki_20180420_100d.txt")
model = KeyedVectors.load_word2vec_format("zhwiki_20180420_100d.txt", binary=True)
EOFError: unexpected end of input; is count incorrect or file otherwise damaged?
model = Word2Vec.load("zhwiki_20180420_100d.txt")
并且我在KeyedVectors.load_word2vec_format中仔细查找了源代码,并未发现有说明如何加载本地模型。
补充:hf_hub_download函数返回的也是缓存后的字符串地址。为何我将本地模型地址传入却加载不了呢?
Hi,
It's normal that the line model = KeyedVectors.load_word2vec_format("zhwiki_20180420_100d.txt", binary=True)
doesn't work. The binary=True
argument can only be used for .bin
files, whereas here it is a .txt
file.
Gensim is not currently supported by HF which is why I'm downloading the file from the Hub and not from something local.
I should have a discussion with the HF teams about this later in the week (cc
@osanseviero
for visibility).
So I hope to have some news to share with you on this point in the coming days.
非常感谢解答,希望你和你的团队能早日实现模型的本地加载。
Some news.
The code provided in the model card works fine.hf_hub_download(repo_id="Word2vec/wikipedia2vec_zhwiki_20180420_100d", filename="zhwiki_20180420_100d.txt")
downloads the MODELfrom the Hub and then caches it (see .cache\huggingface\hub
).
When the code is relaunched, the file is no longer downloaded from the Hub, as hf_hub_download
fetches the previously downloaded file locally.
It should be noted, however, that this download takes a long time to complete (3 min on my side). I observe this long loading time for .txt
files (.bin
files load almost instantaneously).
In your case, it may be better to work with .bin
files (available on https://wikipedia2vec.github.io/wikipedia2vec/pretrained/) and then load it with the Gensim's load_word2vec_format()
function (not the Word2Vec.load
that we indicated in your first message) with pointing the right path. In the worst case, you can use the https://github.com/wikipedia2vec/wikipedia2vec lib.