|
--- |
|
title: Commit Rewriting Visualization |
|
sdk: gradio |
|
sdk_version: 4.25.0 |
|
app_file: change_visualizer.py |
|
--- |
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|
# Description |
|
|
|
This project is a main artifact of the "Research on evaluation for AI Commit Message Generation" research. |
|
|
|
# Structure (important components) |
|
|
|
- ### Configuration: [config.py](config.py) |
|
- Grazie API JWT token and Hugging Face token must be stored as environment variables. |
|
- ### Visualization app -- a Gradio application that is currently deployed |
|
at https://huggingface.co/spaces/JetBrains-Research/commit-rewriting-visualization. |
|
- Shows |
|
- The "golden" dataset of manually collected samples; the dataset is downloaded on startup |
|
from https://huggingface.co/datasets/JetBrains-Research/commit-msg-rewriting |
|
- The entire dataset that includes the synthetic samples; the dataset is downloaded on startup |
|
from https://huggingface.co/datasets/JetBrains-Research/synthetic-commit-msg-rewriting |
|
- Some statistics collected for the dataset (and its parts); computed on startup |
|
|
|
_Note: datasets updated => need to restart the app to see the changes._ |
|
- Files |
|
- [change_visualizer.py](change_visualizer.py) |
|
- ### Data processing pipeline (_note: datasets and files names can be changed in the configuration file_) |
|
- Run the whole pipeline by running [run_pipeline.py](run_pipeline.py) |
|
- All intermediate results are stored as files defined in config |
|
- Intermediate steps (can run them separately by running the corresponding files |
|
from [generation_steps](generation_steps)). The input is then taken from the previous step's artifact. |
|
- Generate the synthetic samples |
|
- Files [generation_steps/synthetic_end_to_start.py](generation_steps/synthetic_end_to_start.py) |
|
and [generation_steps/synthetic_start_to_end.py](generation_steps/synthetic_start_to_end.py) |
|
- The first generation step (end to start) downloads the `JetBrains-Research/commit-msg-rewriting` |
|
and `JetBrains-Research/lca-commit-message-generation` datasets from |
|
Hugging Face datasets. |
|
- Compute metrics |
|
- File [generation_steps/metrics_analysis.py](generation_steps/metrics_analysis.py) |
|
- Includes the functions for all metrics |
|
- Downloads `JetBrains-Research/lca-commit-message-generation` Hugging Face dataset. |
|
- The resulting artifact (dataset with golden and synthetic samples, attached reference messages and computed |
|
metrics) is saved to the file [output/synthetic.csv](output/synthetic.csv). It should be uploaded |
|
to https://huggingface.co/datasets/JetBrains-Research/synthetic-commit-msg-rewriting **manually**. |
|
- ### Data analysis |
|
- [analysis_util.py](analysis_util.py) -- some functions used for data analysis, e.g., correlations computation. |
|
- [analysis.ipynb](analysis.ipynb) -- compute the correlations, the resulting tables. |
|
- [chart_processing.ipynb](chart_processing.ipynb) -- Jupyter Notebook that draws the charts that were used in the |
|
presentation/thesis. |
|
- [generated_message_length_comparison.ipynb](generated_message_length_comparison.ipynb) -- compare the average |
|
length of commit messages generated using the current prompt (one used in the research) and the production prompt |
|
(one used to generate the messages that are measured in FUS logs). _Not finished, because could not get a Grazie |
|
token; as soon as the token is received, the notebook can be run by following the instructions from the notebook._ |
|
|