t5_autochart_2 / README.md
saadob12's picture
Create README.md
86990d6

Training Data

Autochart: Zhu, J., Ran, J., Lee, R. K. W., Choo, K., & Li, Z. (2021). AutoChart: A Dataset for Chart-to-Text Generation Task. arXiv preprint arXiv:2108.06897.

Gitlab Link for the data: https://gitlab.com/bottle_shop/snlg/chart/autochart

Train split for this model: Train 23336, Validation 1297, Test 1296

Example use:

Append C2T: before every input to the model

tokenizer = AutoTokenizer.from_pretrained(saadob12/t5_C2T_autochart)
model =   AutoModelForSeq2SeqLM.from_pretrained(saadob12/t5_C2T_autochart)

data = 'Trade statistics of Qatar with developing economies in North Africa  bar_chart Year-Trade with economies of Middle East & North Africa(%)(Merchandise             exports,Merchandise imports) x-y1-y2 values 2000 0.591869968616745 3.59339030672154 , 2001 0.53415012207203 3.25371165779341 , 2002 3.07769793440318 1.672796364224 , 2003 0.6932513078579471 1.62522475477827 , 2004 1.17635914189321 1.80540331396412'

prefix = 'C2T: '
tokens = tokenizer.encode(prefix + data,  truncation=True, padding='max_length', return_tensors='pt')
generated = model.generate(tokens, num_beams=4, max_length=256)
tgt_text = tokenizer.decode(generated[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
summary = str(tgt_text).strip('[]""')
#Summary: This barchart shows the number of trade statistics of qatar with developing economies in north africa from 2000 through 2004. The unit of measurement in this graph is Trade with economies of Middle East & North Africa(%) as shown on the y-axis. The first group data denotes the change of Merchandise exports. There is a go up and down trend of the number. The peak of the number is found in 2002 and the lowest number is found in 2001. The changes in the number may be related to the conuntry's national policies. The second group data denotes the change of Merchandise imports. There is a go up and down trend of the number. The number in 2000 being the peak, and the lowest number is found in 2003. The changes in the number may be related to the conuntry's national policies. 

Limitations

You can use the model to generate summaries of data files. Works well for general statistics like the following:

Year Children born per woman
2018 1.14
2017 1.45
2016 1.49
2015 1.54
2014 1.6
2013 1.65

May or may not generate an okay summary at best for the following kind of data:

Model BLEU score BLEURT
t5-small 25.4 -0.11
t5-base 28.2 0.12
t5-large 35.4 0.34

Citation

Kindly cite my work. Thank you.

  @misc{obaid ul islam_2022, 
      title={saadob12/t5_C2T_autochart Hugging Face}, 
      url={https://huggingface.co/saadob12/t5_C2T_autochart}, 
      journal={Huggingface.co}, 
      author={Obaid ul Islam, Saad}, 
      year={2022} 
  }