Back to all models
Model card Files and versions Use in transformers
fill-mask mask_token: [MASK]
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint  

⚡️ Upgrade your account to access the Inference API

Share Copied link to clipboard

Contributed by

neuralspace-reverie NeuralSpace-Reverie
12 models

Indic-Transformers Hindi BERT

Model description

This is a BERT language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from OSCAR. This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.

Intended uses & limitations

How to use

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert')
text = "आपका स्वागत हैं"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 5, 768] 

Limitations and bias

The original language model has been trained using PyTorch and hence the use of pytorch_model.bin weights file is recommended. The h5 file for Tensorflow has been generated manually by commands suggested here.