mlkorra commited on
Commit
46fc03d
1 Parent(s): 7bfa78c

Update apps/about.py

Browse files
Files changed (1) hide show
  1. apps/about.py +33 -0
apps/about.py CHANGED
@@ -11,7 +11,40 @@ def app():
11
  st.markdown(
12
  """It is a monolingual transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts."""
13
  )
 
 
 
 
 
 
 
 
 
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  st.markdown("""## Datasets used""")
16
  st.markdown(
17
  """RoBERTa-Hindi has been pretrained on a huge corpus consisting of multiple datasets. The entire list of datasets used is mentioned below : """
11
  st.markdown(
12
  """It is a monolingual transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts."""
13
  )
14
+
15
+ st.markdown(
16
+ ### How to use
17
+
18
+ You can use this model directly with a pipeline for masked language modeling:
19
+ ```python
20
+ >>> from transformers import pipeline
21
+ >>> unmasker = pipeline('fill-mask', model='flax-community/roberta-hindi')
22
+ >>> unmasker("मुझे उनसे बात करना <mask> अच्छा लगा")
23
 
24
+ [{'score': 0.2096337080001831,
25
+ 'sequence': 'मुझे उनसे बात करना एकदम अच्छा लगा',
26
+ 'token': 1462,
27
+ 'token_str': ' एकदम'},
28
+ {'score': 0.17915162444114685,
29
+ 'sequence': 'मुझे उनसे बात करना तब अच्छा लगा',
30
+ 'token': 594,
31
+ 'token_str': ' तब'},
32
+ {'score': 0.15887945890426636,
33
+ 'sequence': 'मुझे उनसे बात करना और अच्छा लगा',
34
+ 'token': 324,
35
+ 'token_str': ' और'},
36
+ {'score': 0.12024253606796265,
37
+ 'sequence': 'मुझे उनसे बात करना लगभग अच्छा लगा',
38
+ 'token': 743,
39
+ 'token_str': ' लगभग'},
40
+ {'score': 0.07114479690790176,
41
+ 'sequence': 'मुझे उनसे बात करना कब अच्छा लगा',
42
+ 'token': 672,
43
+ 'token_str': ' कब'}]
44
+ ```
45
+
46
+
47
+ )
48
  st.markdown("""## Datasets used""")
49
  st.markdown(
50
  """RoBERTa-Hindi has been pretrained on a huge corpus consisting of multiple datasets. The entire list of datasets used is mentioned below : """