|
--- |
|
title: "XLabel: eXplainable Labeling Assistant" |
|
emoji: 💻 |
|
colorFrom: pink |
|
colorTo: gray |
|
sdk: streamlit |
|
sdk_version: 1.15.2 |
|
app_file: app.py |
|
pinned: true |
|
license: apache-2.0 |
|
--- |
|
|
|
# XLabel: e**X**plainable **Label**ing Assistant |
|
|
|
XLabel is an open-source [Streamlit](https://streamlit.io/) app that takes an explainable machine learning approach to visual-interactive data labeling. |
|
|
|
This is the official code of the following paper: |
|
[An Explainable Machine Learning Approach to Visual-Interactive Labeling: A Case Study on Non-communicable Disease Data](https://arxiv.org/abs/2209.12778) |
|
Donlapark Ponnoprat, Parichart Pattarapanitchai, Phimphaka Taninpong, Suthep Suantai |
|
|
|
## News (01/01/2023) |
|
* Use tabs instead of radio buttons for multiple labels. |
|
* The app now requires `streamlit>=1.16.0` for the tabs and `interpret>=0.3.0` for handling missing data. |
|
|
|
## Features |
|
XLabel can: |
|
* Predict the most probable labels using Explainable Boosting Machine (EBM). |
|
* Show the contributions of each feature towards the predicted labels. |
|
* Provide an option to write the labels directly into the data file (use `XLabel.py`) or save them in a separate file (use `XLabelDL.py`) |
|
* Support data with multiple labels and multiple classes. |
|
* Support data with missing values ([thanks to EBM](https://github.com/interpretml/interpret/issues/18)) and/or non-numeric categorical features. |
|
|
|
## Usage |
|
Before using XLabel, the data file must follow the following tabular convention: |
|
* The file must be in either CSV or Excel format. |
|
* The first row of the file must be the names of the columns. |
|
* The first column must contain a unique identifier (id) for each row. |
|
* The label columns must appear last. |
|
In addition, a few instances must have already been labeled, with each class appearing at least once (For example, if a label has five possible classes, then the required minimum number of labeled instances is 5). |
|
|
|
With your data file satisfying these conditions, you can now start data labeling with XLabel! |
|
1. Copy `XLabel.py` to the directory that contains the data file and run the `streamlit` command: |
|
``` |
|
streamlit run XLabel.py |
|
``` |
|
* By design, `XLabel.py` will write the labeled data to the original data file. If instead you would like to download the labeled data as a separate file, use `XLabelDL.py` instead. |
|
* You can assign a specific list of input features for each label by editing `configs.json` and copying it along with `XLabel.py`. There are also other sidebar options that you can play around as well. Here is an example ofr [`configs.json`](configs.json). |
|
2. Upload a data file (only on the first run), select the options on the sidebar, and then click "**Sample**". The samples with lowest predictive confidences will be shown first in the main screen. |
|
3. Check the suggested labels; you can keep the correct ones and change the wrong ones. |
|
4. Click the "**Submit Labels**" button at the bottom of the page to save the labels. |
|
* If you are using `XLabel.py`, the labels will be saved directly to the original data file. |
|
* If you are using `XLabelDL.py`, you need to click the `Download labeled data` in the sidebar to download the labeled data as a new file. |