guenthermi's picture
Update README.md
9dcb56b verified
metadata
license: mit
language:
  - en
tags:
  - schema
  - word-embeddings
  - embeddings
  - unsupervised-learning
  - tables
  - web-table
  - schema-data

Pre-trained Web Table Embeddings

The models here represent schema terms and instance data terms in a semantic vector space making them especially useful for representing schema and class information as well as for ML tasks on tabular text data.

The code for executing and evaluating the models is located in the table-embeddings Github repository

Quick Start

You can install the table_embeddings package to encode text from tables by running the following commands:

pip install cython
pip install git+https://github.com/guenthermi/table-embeddings.git

After that you can encode text with the following Python snippet:

from table_embeddings import TableEmbeddingModel
model = TableEmbeddingModel.load_model('ddrg/web_table_embeddings_plain150')
embedding = model.get_header_vector('headline')

Model Types

Model Type Description Download-Links
W-tax Model of relations between table header and table body (64dim, 150dim)
W-row Model of row-wise relations in tables (64dim, 150dim)
W-combo Model of row-wise relations and relations between table header and table body (64dim, 150dim)
W-plain Model of row-wise relations in tables without pre-processing (64dim, 150dim)

More Information

For examples on how to use the models, you can take a look at the Github repository

More information can be found in the paper Pre-Trained Web Table Embeddings for Table Discovery

@inproceedings{gunther2021pre,
  title={Pre-Trained Web Table Embeddings for Table Discovery},
  author={G{\"u}nther, Michael and Thiele, Maik and Gonsior, Julius and Lehner, Wolfgang},
  booktitle={Fourth Workshop in Exploiting AI Techniques for Data Management},
  pages={24--31},
  year={2021}
}