Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,51 @@
|
|
1 |
---
|
2 |
license: cc-by-4.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-4.0
|
3 |
---
|
4 |
+
# SchemaPile Foreign Key Detection Model (T5-base)
|
5 |
+
|
6 |
+
## Model Description
|
7 |
+
|
8 |
+
In this repository we are introducing **schemapile-fk-starcoder**. It's a language model, based on [google-t5/t5-base](https://huggingface.co/google-t5/t5-base) fine-tuned for predicting foreign key relationships in relational database schemas.
|
9 |
+
|
10 |
+
## Training Data
|
11 |
+
|
12 |
+
Forein key pairs extracted from [SchemaPile](https://zenodo.org/records/10931803), a large collection of relational database schemas.
|
13 |
+
|
14 |
+
## Evaluation Data
|
15 |
+
|
16 |
+
We evaluate the foreign key detection accuracy of [schemapile-fk-starcoder](https://huggingface.co/tdoehmen/schemapile-fk-starcoder) and [schemapile-fk-t5](https://huggingface.co/tdoehmen/schemapile-fk-t5) on schemas from [Spider](https://yale-lily.github.io/spider), [BIRD-SQL](https://bird-bench.github.io/), and [CTU PRLR](https://arxiv.org/abs/1511.03086).
|
17 |
+
|
18 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/616ea71919594606318887e9/6ouh4u6PFQlY8prLrAm4l.png" alt="eval" width="400"/>
|
19 |
+
|
20 |
+
## Training Procedure
|
21 |
+
|
22 |
+
The model was trained using the following hyperparamters:
|
23 |
+
|
24 |
+
- batch_size = 16
|
25 |
+
- learning_rate=4e-5,
|
26 |
+
- weight_decay=0.01,
|
27 |
+
- num_train_epochs=1
|
28 |
+
|
29 |
+
See [Training Code](https://github.com/amsterdata/schemapile/blob/main/experiments/foreign_key_detection/finetune-t5-schemapile.ipynb).
|
30 |
+
|
31 |
+
## How to Use
|
32 |
+
|
33 |
+
We recommend using the provided prompt template and constrained output using jsonformer:
|
34 |
+
|
35 |
+
Example Prompt:
|
36 |
+
```
|
37 |
+
You are given the following SQL database tables:
|
38 |
+
staff(staff_id, staff_address_id, nickname, first_name, middle_name, last_name, date_of_birth, date_joined_staff, date_left_staff)
|
39 |
+
addresses(address_id, line_1_number_building, city, zip_postcode, state_province_county, country)
|
40 |
+
Output a json string with the following schema {table, column, referencedTable, referencedColumn} that contains the foreign key relationship between the two tables.
|
41 |
+
```
|
42 |
+
|
43 |
+
Example Output:
|
44 |
+
```
|
45 |
+
{'table': 'staff',
|
46 |
+
'column': 'staff_address_id',
|
47 |
+
'referencedTable': 'addresses',
|
48 |
+
'referencedColumn': 'address_id'}
|
49 |
+
```
|
50 |
+
|
51 |
+
To run the model locally, we recommend using our end-to-end [Example Notebook](https://github.com/amsterdata/schemapile/blob/main/experiments/foreign_key_detection/schemapile-fk-t5-example.ipynb).
|