parkervg/destt5-text2sql

Fine-tuned weights for the text2sql model described in Correcting Semantic Parses with Natural Language through Dynamic Schema Encoding, based on t5-base.

Training Data

The model has been fine-tuned on the 7,481 training examples in the SPLASH interactive semantic parsing dataset.

Rather than seeing the full database schema, it only received the filtered schema as predicted by the destt5-schema-prediction model

Training Objective

This model was initialized with t5-base and fine-tuned with the text-to-text generation objective.

As this model works in the interactive setting, we utilize the standard text2sql features such as question and db_schema, in addition to feedback and incorrect_parse.

Importantly, the [table], [column], [content] features are expected to be the 'gold' schema items, as predicted by an initial auxiliary schema prediction model.

[question] || [incorrect_parse] || [db_id] | [table] : [column] ( [content] , [content] ) , [column] ( ... ) , [...] | [table] : ... | ... || [feedback]

The model then attempts to parse the corrected SQL query, using the filtered database schema items. This is prefaced by the db_id.

[db_id] | [sql]

Performance

When this model receives the serialized database schema as predicted by destt5-schema-prediction, it achieves 53.43% correction accuracy (exact-match) on the SPLASH test set.

References

Citation

@inproceedings{glenn2023correcting,
  author = {Parker Glenn, Parag Pravin Dakle, Preethi Raghavan},
  title = "Correcting Semantic Parses with Natural Language through Dynamic Schema Encoding",
  booktitle = "Proceedings of the 5th Workshop on NLP for Conversational AI",
  publisher = "Association for Computational Linguistics",
  year = "2023"
}