How can this model be used on tables that I have stored in Postgres

#2
by Chelcie - opened

I have a table called employees from adventureworks that I pulled from postgres.

creating employee df from postgres employee table

with engine.begin() as conn:
    query = text("""SELECT businessentityid,jobtitle FROM humanresources.employee""")
    employee = pd.read_sql_query(query, conn)
employee

I then run the following:

query = "how many rows does the employee table have?"
encoding = tokenizer(table=employee, query=query, return_tensors="pt")

outputs = model.generate(**encoding)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

and get the following warning/errors:

Token indices sequence length is longer than the specified maximum sequence length for this model (2813 > 1024). Running this sequence through the model will result in indexing errors
IndexError: index out of range in self
Microsoft org

@Chelcie Hello, thanks for your interest on our work! I think the problem should be that the table is larger than the maximum positions of TAPEX. You may use the default truncation strategy defined in tapex as:

            encoding = tokenizer(
                table=employee,
                query= query,
                max_length=1024,
                truncation=True,
            )

And try again!

SivilTaram changed discussion status to closed

Sign up or log in to comment