File size: 1,912 Bytes
3035718
 
8de3ce6
 
 
 
fc667d3
 
 
236f3fd
3035718
8de3ce6
3035718
 
8de3ce6
 
 
d007c44
 
9601fdd
2845e95
e9c7fbc
 
9601fdd
41910ce
e9c7fbc
8de3ce6
 
9601fdd
8de3ce6
8d58ada
11633ba
 
6f010c9
8de3ce6
3035718
 
8de3ce6
 
 
3f4d24f
e32502b
a7582c2
e32502b
 
 
 
92e6843
e9c7fbc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---

language:
- en
tags:
- t5
- qa
- askscience
- lfqa
- information retrieval
datasets:
- eli5
metrics:
- rouge
widget:
- text: "why aren't there more planets in our solar system?"
  example_title: "solar system"
- text: "question: what is a probability distribution? context: I am just learning about statistics."
  example_title: "probability distribution"
- text: "question: What are the underlying physical processes by which exercise helps us lose weight? context: I started working out two weeks ago and already feel a lot better, and started to think about it and became deeply confused."
  example_title: "pumpen"
- text: "what is a neural network?"
  example_title: "deep learning"
- text: "What are the primary mechanisms that computers use to understand human language?"
  example_title: "NLP"
  
inference:
  parameters:
    max_length: 96
    no_repeat_ngram_size: 2
    encoder_no_repeat_ngram_size: 4
    repetition_penalty: 3.51
    length_penalty: 0.8
    num_beams: 4
    early_stopping: True
    
---

# t5 - base- askscience

- [t5-v1_1](https://huggingface.co/google/t5-v1_1-base) trained on the entirety of the _askscience_ sub-section of the eli5 dataset for one epoch.
- compare to bart on eli5 [here](https://huggingface.co/yjernite/bart_eli5)
- note that for the inference API, the model is restricted to outputting 96 tokens - by using the model in python with the transformers library, you can get longer outputs.

## training 

- for inputs, the model was presented with the post title and the post selftext encoded as: `question: <post title> context: <post selftext>`. You may see better results if queries are posed in this fashion.
- The top two replies were aggregated and presented to the model as the output text.
- Training for longer will be explored, but given that the dataset has 127k examples and the loss flatlines at 0.5 epochs so this model should be fairly viable.