tdoehmen commited on
Commit
672e437
1 Parent(s): bb4f9f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -1,3 +1,51 @@
1
  ---
2
  license: cc-by-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
  ---
4
+ # SchemaPile Foreign Key Detection Model (T5-base)
5
+
6
+ ## Model Description
7
+
8
+ In this repository we are introducing **schemapile-fk-starcoder**. It's a language model, based on [google-t5/t5-base](https://huggingface.co/google-t5/t5-base) fine-tuned for predicting foreign key relationships in relational database schemas.
9
+
10
+ ## Training Data
11
+
12
+ Forein key pairs extracted from [SchemaPile](https://zenodo.org/records/10931803), a large collection of relational database schemas.
13
+
14
+ ## Evaluation Data
15
+
16
+ We evaluate the foreign key detection accuracy of [schemapile-fk-starcoder](https://huggingface.co/tdoehmen/schemapile-fk-starcoder) and [schemapile-fk-t5](https://huggingface.co/tdoehmen/schemapile-fk-t5) on schemas from [Spider](https://yale-lily.github.io/spider), [BIRD-SQL](https://bird-bench.github.io/), and [CTU PRLR](https://arxiv.org/abs/1511.03086).
17
+
18
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/616ea71919594606318887e9/6ouh4u6PFQlY8prLrAm4l.png" alt="eval" width="400"/>
19
+
20
+ ## Training Procedure
21
+
22
+ The model was trained using the following hyperparamters:
23
+
24
+ - batch_size = 16
25
+ - learning_rate=4e-5,
26
+ - weight_decay=0.01,
27
+ - num_train_epochs=1
28
+
29
+ See [Training Code](https://github.com/amsterdata/schemapile/blob/main/experiments/foreign_key_detection/finetune-t5-schemapile.ipynb).
30
+
31
+ ## How to Use
32
+
33
+ We recommend using the provided prompt template and constrained output using jsonformer:
34
+
35
+ Example Prompt:
36
+ ```
37
+ You are given the following SQL database tables:
38
+ staff(staff_id, staff_address_id, nickname, first_name, middle_name, last_name, date_of_birth, date_joined_staff, date_left_staff)
39
+ addresses(address_id, line_1_number_building, city, zip_postcode, state_province_county, country)
40
+ Output a json string with the following schema {table, column, referencedTable, referencedColumn} that contains the foreign key relationship between the two tables.
41
+ ```
42
+
43
+ Example Output:
44
+ ```
45
+ {'table': 'staff',
46
+ 'column': 'staff_address_id',
47
+ 'referencedTable': 'addresses',
48
+ 'referencedColumn': 'address_id'}
49
+ ```
50
+
51
+ To run the model locally, we recommend using our end-to-end [Example Notebook](https://github.com/amsterdata/schemapile/blob/main/experiments/foreign_key_detection/schemapile-fk-t5-example.ipynb).