Representativity-based active learning for regression using Wasserstein distance and GroupSort Neural Networks

You will find in this repository the codes used to test the performance of the WAR model on a fully labeled dataset

WAR-notebook : you can run the algorithm from there and change the desired parameters

WAR directory

Experiment_functions.py : functions used to vizualise information about WAR process (loss, metrics, points queried every rounds...).

Models.py: Definition of the two neural networks h and phi.

dataset_handler.py: Import and preprocess datasets.

full_training_process.py: main function.

training_and_query.py: function to run one round (one training and querying process).

Abstract

This paper proposes a new active learning strategy called Wasserstein active regression (WAR) based on the principle of distribution-matching to measure the representativeness of our labeled dataset compared to the global data distribution. We use GroupSort Neural Networks to compute the Wasserstein distance and provide theoretical foundations to justify the use of such networks with explicit bounds for their size and depth. We combine this solution with another diversity and uncertainty-based approach to sharpen our query strategy. Finally, we compare our method with other solutions and show empirically that we consistently achieve better estimations with less labeled data.