--- title: Commit Rewriting Visualization sdk: gradio sdk_version: 4.25.0 app_file: change_visualizer.py --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # Description This project is a main artifact of the "Research on evaluation for AI Commit Message Generation" research. # Structure (important components) - ### Configuration: [config.py](config.py) - Grazie API JWT token and Hugging Face token must be stored as environment variables. - ### Visualization app -- a Gradio application that is currently deployed at https://huggingface.co/spaces/JetBrains-Research/commit-rewriting-visualization. - Shows - The "golden" dataset of manually collected samples; the dataset is downloaded on startup from https://huggingface.co/datasets/JetBrains-Research/commit-msg-rewriting - The entire dataset that includes the synthetic samples; the dataset is downloaded on startup from https://huggingface.co/datasets/JetBrains-Research/synthetic-commit-msg-rewriting - Some statistics collected for the dataset (and its parts); computed on startup _Note: datasets updated => need to restart the app to see the changes._ - Files - [change_visualizer.py](change_visualizer.py) - ### Data processing pipeline (_note: datasets and files names can be changed in the configuration file_) - Run the whole pipeline by running [run_pipeline.py](run_pipeline.py) - All intermediate results are stored as files defined in config - Intermediate steps (can run them separately by running the corresponding files from [generation_steps](generation_steps)). The input is then taken from the previous step's artifact. - Generate the synthetic samples - Files [generation_steps/synthetic_end_to_start.py](generation_steps/synthetic_end_to_start.py) and [generation_steps/synthetic_start_to_end.py](generation_steps/synthetic_start_to_end.py) - The first generation step (end to start) downloads the `JetBrains-Research/commit-msg-rewriting` and `JetBrains-Research/lca-commit-message-generation` datasets from Hugging Face datasets. - Compute metrics - File [generation_steps/metrics_analysis.py](generation_steps/metrics_analysis.py) - Includes the functions for all metrics - Downloads `JetBrains-Research/lca-commit-message-generation` Hugging Face dataset. - The resulting artifact (dataset with golden and synthetic samples, attached reference messages and computed metrics) is saved to the file [output/synthetic.csv](output/synthetic.csv). It should be uploaded to https://huggingface.co/datasets/JetBrains-Research/synthetic-commit-msg-rewriting **manually**. - ### Data analysis - [analysis_util.py](analysis_util.py) -- some functions used for data analysis, e.g., correlations computation. - [analysis.ipynb](analysis.ipynb) -- compute the correlations, the resulting tables. - [chart_processing.ipynb](chart_processing.ipynb) -- Jupyter Notebook that draws the charts that were used in the presentation/thesis.