Spaces:
Runtime error
Runtime error
Petr Tsvetkov
commited on
Commit
Β·
3907263
1
Parent(s):
a7bba68
Added some description to the README.md
Browse files
README.md
CHANGED
|
@@ -5,4 +5,48 @@ sdk_version: 4.25.0
|
|
| 5 |
app_file: change_visualizer.py
|
| 6 |
---
|
| 7 |
|
| 8 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
app_file: change_visualizer.py
|
| 6 |
---
|
| 7 |
|
| 8 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 9 |
+
|
| 10 |
+
# Description
|
| 11 |
+
|
| 12 |
+
This project is a main artifact of the "Research on evaluation for AI Commit Message Generation" research.
|
| 13 |
+
|
| 14 |
+
# Structure (important components)
|
| 15 |
+
|
| 16 |
+
- ### Configuration: [config.py](config.py)
|
| 17 |
+
- Grazie API JWT token and Hugging Face token must be stored as environment variables.
|
| 18 |
+
- ### Visualization app -- a Gradio application that is currently deployed
|
| 19 |
+
at https://huggingface.co/spaces/JetBrains-Research/commit-rewriting-visualization.
|
| 20 |
+
- Shows
|
| 21 |
+
- The "golden" dataset of manually collected samples; the dataset is downloaded on startup
|
| 22 |
+
from https://huggingface.co/datasets/JetBrains-Research/commit-msg-rewriting
|
| 23 |
+
- The entire dataset that includes the synthetic samples; the dataset is downloaded on startup
|
| 24 |
+
from https://huggingface.co/datasets/JetBrains-Research/synthetic-commit-msg-rewriting
|
| 25 |
+
- Some statistics collected for the dataset (and its parts); computed on startup
|
| 26 |
+
|
| 27 |
+
_Note: datasets updated => need to restart the app to see the changes._
|
| 28 |
+
- Files
|
| 29 |
+
- [change_visualizer.py](change_visualizer.py)
|
| 30 |
+
- ### Data processing pipeline (_note: datasets and files names can be changed in the configuration file_)
|
| 31 |
+
- Run the whole pipeline by running [run_pipeline.py](run_pipeline.py)
|
| 32 |
+
- All intermediate results are stored as files defined in config
|
| 33 |
+
- Intermediate steps (can run them separately by running the corresponding files
|
| 34 |
+
from [generation_steps](generation_steps)). The input is then taken from the previous step's artifact.
|
| 35 |
+
- Generate the synthetic samples
|
| 36 |
+
- Files [generation_steps/synthetic_end_to_start.py](generation_steps/synthetic_end_to_start.py)
|
| 37 |
+
and [generation_steps/synthetic_start_to_end.py](generation_steps/synthetic_start_to_end.py)
|
| 38 |
+
- The first generation step (end to start) downloads the `JetBrains-Research/commit-msg-rewriting`
|
| 39 |
+
and `JetBrains-Research/lca-commit-message-generation` datasets from
|
| 40 |
+
Hugging Face datasets.
|
| 41 |
+
- Compute metrics
|
| 42 |
+
- File [generation_steps/metrics_analysis.py](generation_steps/metrics_analysis.py)
|
| 43 |
+
- Includes the functions for all metrics
|
| 44 |
+
- Downloads `JetBrains-Research/lca-commit-message-generation` Hugging Face dataset.
|
| 45 |
+
- The resulting artifact (dataset with golden and synthetic samples, attached reference messages and computed
|
| 46 |
+
metrics) is saved to the file [output/synthetic.csv](output/synthetic.csv). It should be uploaded
|
| 47 |
+
to https://huggingface.co/datasets/JetBrains-Research/synthetic-commit-msg-rewriting **manually**.
|
| 48 |
+
- ### Data analysis
|
| 49 |
+
- [analysis_util.py](analysis_util.py) -- some functions used for data analysis, e.g., correlations computation.
|
| 50 |
+
- [analysis.ipynb](analysis.ipynb) -- compute the correlations, the resulting tables.
|
| 51 |
+
- [chart_processing.ipynb](chart_processing.ipynb) -- Jupyter Notebook that draws the charts that were used in the
|
| 52 |
+
presentation/thesis.
|