Spaces:

flax-community
/

roberta-hindi

Runtime error

App Files Files Community

mlkorra commited on Jul 18, 2021

Commit

46fc03d

•

1 Parent(s): 7bfa78c

Update apps/about.py

Browse files

Files changed (1) hide show

apps/about.py +33 -0

apps/about.py CHANGED Viewed

@@ -11,7 +11,40 @@ def app():
     st.markdown(
         """It is a monolingual transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts."""
     )
     st.markdown("""## Datasets used""")
     st.markdown(
         """RoBERTa-Hindi has been pretrained on a huge corpus consisting of multiple datasets. The entire list of datasets used is mentioned below : """

     st.markdown(
         """It is a monolingual transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts."""
     )
+    st.markdown(
+        ### How to use
+You can use this model directly with a pipeline for masked language modeling:
+```python
+>>> from transformers import pipeline
+>>> unmasker = pipeline('fill-mask', model='flax-community/roberta-hindi')
+>>> unmasker("मुझे उनसे बात करना <mask> अच्छा लगा")
+[{'score': 0.2096337080001831,
+  'sequence': 'मुझे उनसे बात करना एकदम अच्छा लगा',
+  'token': 1462,
+  'token_str': ' एकदम'},
+ {'score': 0.17915162444114685,
+  'sequence': 'मुझे उनसे बात करना तब अच्छा लगा',
+  'token': 594,
+  'token_str': ' तब'},
+ {'score': 0.15887945890426636,
+  'sequence': 'मुझे उनसे बात करना और अच्छा लगा',
+  'token': 324,
+  'token_str': ' और'},
+ {'score': 0.12024253606796265,
+  'sequence': 'मुझे उनसे बात करना लगभग अच्छा लगा',
+  'token': 743,
+  'token_str': ' लगभग'},
+ {'score': 0.07114479690790176,
+  'sequence': 'मुझे उनसे बात करना कब अच्छा लगा',
+  'token': 672,
+  'token_str': ' कब'}]
+```
+    )
     st.markdown("""## Datasets used""")
     st.markdown(
         """RoBERTa-Hindi has been pretrained on a huge corpus consisting of multiple datasets. The entire list of datasets used is mentioned below : """