Papers
arxiv:2309.17196

ResBit: Residual Bit Vector for Categorical Values

Published on Sep 29, 2023
Authors:
,
,

Abstract

One-hot vectors, a common method for representing discrete/categorical data, in machine learning are widely used because of their simplicity and intuitiveness. However, one-hot vectors suffer from a linear increase in dimensionality, posing computational and memory challenges, especially when dealing with datasets containing numerous categories. In this paper, we focus on tabular data generation, and reveal the multinomial diffusion faces the mode collapse phenomenon when the cardinality is high. Moreover, due to the limitations of one-hot vectors, the training phase takes time longer in such a situation. To address these issues, we propose Residual Bit Vectors (ResBit), a technique for densely representing categorical data. ResBit is an extension of analog bits and overcomes limitations of analog bits when applied to tabular data generation. Our experiments demonstrate that ResBit not only accelerates training but also maintains performance when compared with the situations before applying ResBit. Furthermore, our results indicate that many existing methods struggle with high-cardinality data, underscoring the need for lower-dimensional representations, such as ResBit and latent vectors.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2309.17196 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2309.17196 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2309.17196 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.