File size: 2,907 Bytes
3f9e227
 
 
 
 
3cd8aec
435fa4d
fb6aeef
adb8ef3
 
 
fb6aeef
 
 
 
 
 
 
 
 
 
 
 
 
3f9e227
3077a25
3f9e227
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f063580
 
 
 
 
 
 
 
d5eeb99
 
f063580
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Training Data
**Autochart:** Zhu, J., Ran, J., Lee, R. K. W., Choo, K., & Li, Z. (2021). AutoChart: A Dataset for Chart-to-Text Generation Task. arXiv preprint arXiv:2108.06897.

**Gitlab Link for the data**: https://gitlab.com/bottle_shop/snlg/chart/autochart

Train split for this model: Train 8000, Validation 1297, Test 1296

# Example use: 
Append ```C2T: ``` before every input to the model


```
tokenizer = AutoTokenizer.from_pretrained(saadob12/t5_C2T_autochart)
model =   AutoModelForSeq2SeqLM.from_pretrained(saadob12/t5_C2T_autochart)

data = 'Trade statistics of Qatar with developing economies in North Africa  bar_chart Year-Trade with economies of Middle East & North Africa(%)(Merchandise             exports,Merchandise imports) x-y1-y2 values 2000 0.591869968616745 3.59339030672154 , 2001 0.53415012207203 3.25371165779341 , 2002 3.07769793440318 1.672796364224 , 2003 0.6932513078579471 1.62522475477827 , 2004 1.17635914189321 1.80540331396412'

prefix = 'C2T: '
tokens = tokenizer.encode(prefix + data,  truncation=True, padding='max_length', return_tensors='pt')
generated = model.generate(tokens, num_beams=4, max_length=256)
tgt_text = tokenizer.decode(generated[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
summary = str(tgt_text).strip('[]""')
#Summary: This barchart shows the number of trade statistics of qatar with developing economies in north africa from 2000 through 2004. The unit of measurement in this graph is Trade with economies of Middle East & North Africa(%) as shown on the y-axis. The first group data denotes the change of Merchandise exports. There is a go up and down trend of the number. The peak of the number is found in 2002 and the lowest number is found in 2001. The changes in the number may be related to the conuntry's national policies. The second group data denotes the change of Merchandise imports. There is a go up and down trend of the number. The number in 2000 being the peak, and the lowest number is found in 2003. The changes in the number may be related to the conuntry's national policies. 
```
# Limitations
You can use the model to generate summaries of data files.
Works well for general statistics like the following: 

| Year  | Children born per woman  |
|:---:|:---:|
| 2018  | 1.14  | 
| 2017  | 1.45 |  
| 2016  | 1.49  |  
| 2015  | 1.54 |  
| 2014  | 1.6  |  
| 2013  | 1.65  |  

May or may not generate an **okay** summary at best for the following kind of data:

| Model  | BLEU score  | BLEURT|
|:---:|:---:|:---:|
| t5-small  | 25.4  | -0.11 | 
| t5-base  | 28.2 | 0.12 |
| t5-large  |  35.4 | 0.34 |



# Citation

Kindly cite my work. Thank you.
``` 
  @misc{obaid ul islam_2022, 
      title={saadob12/t5_C2T_autochart Hugging Face}, 
      url={https://huggingface.co/saadob12/t5_C2T_autochart}, 
      journal={Huggingface.co}, 
      author={Obaid ul Islam, Saad}, 
      year={2022} 
  }
```