saadob12 commited on
Commit
86990d6
1 Parent(s): a08f450

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Training Data
2
+ **Autochart:** Zhu, J., Ran, J., Lee, R. K. W., Choo, K., & Li, Z. (2021). AutoChart: A Dataset for Chart-to-Text Generation Task. arXiv preprint arXiv:2108.06897.
3
+
4
+ **Gitlab Link for the data**: https://gitlab.com/bottle_shop/snlg/chart/autochart
5
+
6
+ Train split for this model: Train 23336, Validation 1297, Test 1296
7
+
8
+ # Example use:
9
+ Append ```C2T: ``` before every input to the model
10
+
11
+
12
+ ```
13
+ tokenizer = AutoTokenizer.from_pretrained(saadob12/t5_C2T_autochart)
14
+ model = AutoModelForSeq2SeqLM.from_pretrained(saadob12/t5_C2T_autochart)
15
+
16
+ data = 'Trade statistics of Qatar with developing economies in North Africa bar_chart Year-Trade with economies of Middle East & North Africa(%)(Merchandise exports,Merchandise imports) x-y1-y2 values 2000 0.591869968616745 3.59339030672154 , 2001 0.53415012207203 3.25371165779341 , 2002 3.07769793440318 1.672796364224 , 2003 0.6932513078579471 1.62522475477827 , 2004 1.17635914189321 1.80540331396412'
17
+
18
+ prefix = 'C2T: '
19
+ tokens = tokenizer.encode(prefix + data, truncation=True, padding='max_length', return_tensors='pt')
20
+ generated = model.generate(tokens, num_beams=4, max_length=256)
21
+ tgt_text = tokenizer.decode(generated[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
22
+ summary = str(tgt_text).strip('[]""')
23
+ #Summary: This barchart shows the number of trade statistics of qatar with developing economies in north africa from 2000 through 2004. The unit of measurement in this graph is Trade with economies of Middle East & North Africa(%) as shown on the y-axis. The first group data denotes the change of Merchandise exports. There is a go up and down trend of the number. The peak of the number is found in 2002 and the lowest number is found in 2001. The changes in the number may be related to the conuntry's national policies. The second group data denotes the change of Merchandise imports. There is a go up and down trend of the number. The number in 2000 being the peak, and the lowest number is found in 2003. The changes in the number may be related to the conuntry's national policies.
24
+ ```
25
+ # Limitations
26
+ You can use the model to generate summaries of data files.
27
+ Works well for general statistics like the following:
28
+
29
+ | Year | Children born per woman |
30
+ |:---:|:---:|
31
+ | 2018 | 1.14 |
32
+ | 2017 | 1.45 |
33
+ | 2016 | 1.49 |
34
+ | 2015 | 1.54 |
35
+ | 2014 | 1.6 |
36
+ | 2013 | 1.65 |
37
+
38
+ May or may not generate an **okay** summary at best for the following kind of data:
39
+
40
+ | Model | BLEU score | BLEURT|
41
+ |:---:|:---:|:---:|
42
+ | t5-small | 25.4 | -0.11 |
43
+ | t5-base | 28.2 | 0.12 |
44
+ | t5-large | 35.4 | 0.34 |
45
+
46
+
47
+
48
+ # Citation
49
+
50
+ Kindly cite my work. Thank you.
51
+ ```
52
+ @misc{obaid ul islam_2022,
53
+ title={saadob12/t5_C2T_autochart Hugging Face},
54
+ url={https://huggingface.co/saadob12/t5_C2T_autochart},
55
+ journal={Huggingface.co},
56
+ author={Obaid ul Islam, Saad},
57
+ year={2022}
58
+ }
59
+ ```