Back to all models

Unable to determine this model’s pipeline type. Check the docs .

Monthly model downloads

voidful/albert_chinese_small voidful/albert_chinese_small
last 30 days



Contributed by

voidful voidful
6 models

How to use this model directly from the 🤗/transformers library:

Copy to clipboard
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("voidful/albert_chinese_small") model = AutoModel.from_pretrained("voidful/albert_chinese_small")


This a albert_chinese_small model from brightmart/albert_zh project, albert_small_google_zh model
converted by huggingface's script

Attention (注意)

Since sentencepiece is not used in albert_chinese_small model
you have to call BertTokenizer instead of AlbertTokenizer !!! we can eval it using an example on MaskedLM

由於 albert_chinese_small 模型沒有用 sentencepiece
用AlbertTokenizer會載不進詞表,因此需要改用BertTokenizer !!! 我們可以跑MaskedLM預測來驗證這個做法是否正確

Justify (驗證有效性)

colab trial

from transformers import *
import torch
from torch.nn.functional import softmax

pretrained = 'voidful/albert_chinese_small'
tokenizer = BertTokenizer.from_pretrained(pretrained)
model = AlbertForMaskedLM.from_pretrained(pretrained)

inputtext = "今天[MASK]情很好"

maskpos = tokenizer.encode(inputtext, add_special_tokens=True).index(103)

input_ids = torch.tensor(tokenizer.encode(inputtext, add_special_tokens=True)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, masked_lm_labels=input_ids)
loss, prediction_scores = outputs[:2]
logit_prob = softmax(prediction_scores[0, maskpos]).data.tolist()
predicted_index = torch.argmax(prediction_scores[0, maskpos]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]

Result: 感 0.6390823125839233