julien-c
HF staff
commited on
Commit
2b9e6ed
1 Parent(s): d925125

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/neuralspace-reverie/indic-transformers-bn-roberta/README.md

Files changed (1) hide show
  1. README.md +29 -0
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - bn
4
+ tags:
5
+ - MaskedLM
6
+ - Bengali
7
+ - RoBERTa
8
+ - Question-Answering
9
+ - Token Classification
10
+ - Text Classification
11
+ ---
12
+ # Indic-Transformers Bengali RoBERTa
13
+ ## Model description
14
+ This is a RoBERTa language model pre-trained on ~6 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
15
+ This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
16
+ ## Intended uses & limitations
17
+ #### How to use
18
+ ```
19
+ from transformers import AutoTokenizer, AutoModel
20
+ tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-bn-roberta')
21
+ model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-bn-roberta')
22
+ text = "আপনি কেমন আছেন?"
23
+ input_ids = tokenizer(text, return_tensors='pt')['input_ids']
24
+ out = model(input_ids)[0]
25
+ print(out.shape)
26
+ # out = [1, 10, 768]
27
+ ```
28
+ #### Limitations and bias
29
+ The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).