File size: 4,171 Bytes
6ac5bc5
c2c6b01
 
ebbe24d
c2c6b01
 
 
 
 
ff06e9f
 
e54ca6f
 
 
6ac5bc5
c2c6b01
6ac5bc5
 
 
 
 
 
c2c6b01
6ac5bc5
c2c6b01
 
 
 
 
6ac5bc5
c2c6b01
6ac5bc5
c2c6b01
 
6ac5bc5
c2c6b01
6ac5bc5
c2c6b01
 
 
 
6ac5bc5
c2c6b01
 
6ac5bc5
c2c6b01
 
6ac5bc5
c2c6b01
 
 
 
 
 
 
 
 
6ac5bc5
c2c6b01
 
6ac5bc5
c2c6b01
6ac5bc5
c2c6b01
6ac5bc5
c2c6b01
 
 
 
 
 
 
 
 
6ac5bc5
 
 
c2c6b01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bd6ed33
c2c6b01
 
 
 
6ac5bc5
 
 
 
 
ebbe24d
6ac5bc5
c2c6b01
6ac5bc5
c2c6b01
6ac5bc5
c2c6b01
6ac5bc5
 
 
 
c2c6b01
 
 
 
 
 
 
 
 
ff06e9f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
license: apache-2.0
datasets:
- yuan-tian/chartgpt-dataset-llama3
language:
- en
metrics:
- rouge
pipeline_tag: text2text-generation
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
library_name: transformers
tags:
- text-generation-inference
---
# Model Card for ChartGPT-Llama3

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
This model is used to generate charts from natural language. For more information, please refer to the paper. 

* **Model type:** Language model
* **Language(s) (NLP)**: English
* **License**: Apache 2.0
* **Finetuned from model**: [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
* **Research paper**: [ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language](https://ieeexplore.ieee.org/document/10443572)

### Model Input Format

<details>
<summary> Click to expand </summary>

Model input on the Step `x`.

```
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Your response should follow the following format:
{Step 1 prompt}

{Step x-1 prompt}
{Step x prompt}

### Instruction:
{instruction}

### Input:
Table Name: {table name}
Table Header: {column names}
Table Header Type: {column types}
Table Data Example:
{data row 1}
{data row 2}
Previous Answer:
{previous answer}

### Response:
```

And the model should output the answer corresponding to step `x`.  

The step 1-6 prompts are as follows: 

```
Step 1. Select the columns:
Step 2. Filter the data:
Step 3. Add aggregate functions:
Step 4. Choose chart type:
Step 5. Select encodings:
Step 6. Sort the data:
```
</details>

## How to Get Started with the Model

### Running the Model on a GPU

An example of a movie dataset with an instruction "Give me a visual representation of the faculty members by their professional status.". 
The model should give the answers to all steps. 
You can use the code below to test if you can run the model successfully. 

<details>
<summary> Click to expand </summary>

```python
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
)
tokenizer = AutoTokenizer.from_pretrained("yuan-tian/chartgpt-llama3")
model = AutoModelForCausalLM.from_pretrained("yuan-tian/chartgpt-llama3", device_map="auto")
input_text = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Your response should follow the following format:
Step 1. Select the columns:
Step 2. Filter the data:
Step 3. Add aggregate functions:
Step 4. Choose chart type:
Step 5. Select encodings:
Step 6. Sort the data:

### Instruction:
Give me a visual representation of the faculty members by their professional status.

### Input:
Table Name: Faculty
Table Header: FacID,Lname,Fname,Rank,Sex,Phone,Room,Building
Table Header Type: quantitative,nominal,nominal,nominal,nominal,quantitative,nominal,nominal
Table Data Example:
1082,Giuliano,Mark,Instructor,M,2424,224,NEB
1121,Goodrich,Michael,Professor,M,3593,219,NEB
Previous Answer:


### Response:"""
inputs = tokenizer(input_text, return_tensors="pt", padding=True).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens = True))
```

</details>

## Training Details

### Training Data

This model is Fine-tuned from [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the [chartgpt-dataset-llama3](https://huggingface.co/datasets/yuan-tian/chartgpt-dataset-llama3). 

### Training Procedure 

Plan to update the preprocessing and training procedure in the future. 

## Citation


**BibTeX:**

```
@article{tian2024chartgpt,
  title={ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language},
  author={Tian, Yuan and Cui, Weiwei and Deng, Dazhen and Yi, Xinjing and Yang, Yurun and Zhang, Haidong and Wu, Yingcai},
  journal={IEEE Transactions on Visualization and Computer Graphics},
  year={2024},
  pages={1-15},
  doi={10.1109/TVCG.2024.3368621}
}
```