note that for the inference API, the model is restricted to outputting 96 tokens - by using the model in python with the transformers library, you can get longer outputs.
training
for inputs, the model was presented with the post title and the post selftext encoded as: question: <post title> context: <post selftext>. You may see better results if queries are posed in this fashion.
The top two replies were aggregated and presented to the model as the output text.
Training for longer will be explored, but given that the dataset has 127k examples and the loss flatlines at 0.5 epochs so this model should be fairly viable.