Benjamin Consolvo commited on
Commit
a61565b
1 Parent(s): 0e961d2

description expanded to include efficient deployment

Browse files
Files changed (1) hide show
  1. app.py +5 -4
app.py CHANGED
@@ -21,16 +21,17 @@ def predict(context,question):
21
  answer = predictions['answer']
22
  start = predictions['start']
23
  end = predictions['end']
24
- return score,answer,start
25
 
26
  md = """
27
- Introduction: If you came looking for chatGPT, sorry to disappoint, but this is different. This prediction model is designed to answer a question about a text. It is designed to do reading comprehension. The model does not just answer questions in general -- it only works from the text that you provide. However, accomplishing accurate reading comprehension can be a very valuable task, especially if you are attempting to get quick answers from a large (and maybe boring!) document.
28
 
29
- The model is based on the Zafrir et al. (2021) paper: [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754).
30
 
31
  The training dataset used is the English Wikipedia dataset (2500M words), and then fine-tuned on the SQuADv1.1 dataset containing 89K training examples by Rajpurkar et al. (2016): [100, 000+ Questions for Machine Comprehension of Text](https://arxiv.org/abs/1606.05250).
32
 
33
- Author of Hugging Face Space: Benjamin Consolvo, AI Solutions Engineer Manager at Intel\nDate last updated: 01/05/2023
 
34
  """
35
 
36
  # predict()
 
21
  answer = predictions['answer']
22
  start = predictions['start']
23
  end = predictions['end']
24
+ return answer,score,start
25
 
26
  md = """
27
+ Introduction: If you came looking for chatGPT, sorry to disappoint, but this is different. This prediction model is designed to answer a question about a text. It is designed to do reading comprehension. The model does not just answer questions in general -- it only works from the text that you provide. However, accomplishing accurate reading comprehension can be a valuable task.
28
 
29
+ The model is based on the Zafrir et al. (2021) paper: [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754). The model can be found here: https://huggingface.co/Intel/bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa. The main idea of this BERT-Base model is that it is much more fast and efficient in deployment than its dense counterpart: (https://huggingface.co/csarron/bert-base-uncased-squad-v1). It has had weight pruning and model distillation applied to create a sparse weight pattern that is maintained even after fine-tuning has been applied. According to Zafrir et al. (2016), their "results show the best compression-to-accuracy ratio for BERT-Base". This model is still in FP32, but can be quantized to INT8 with the [Intel® Neural Compressor](https://github.com/intel/neural-compressor) for further compression.
30
 
31
  The training dataset used is the English Wikipedia dataset (2500M words), and then fine-tuned on the SQuADv1.1 dataset containing 89K training examples by Rajpurkar et al. (2016): [100, 000+ Questions for Machine Comprehension of Text](https://arxiv.org/abs/1606.05250).
32
 
33
+ Author of Hugging Face Space: Benjamin Consolvo, AI Solutions Engineer Manager at Intel
34
+ Date last updated: 01/05/2023
35
  """
36
 
37
  # predict()