|
--- |
|
datasets: |
|
- ai4anshu/sentence-compression |
|
language: |
|
- en |
|
metrics: |
|
- rouge |
|
library_name: transformers |
|
pipeline_tag: summarization |
|
--- |
|
## Getting Started |
|
|
|
### Installation |
|
|
|
1. conda environment |
|
``` |
|
conda env create --name NAME --file=environment.yaml |
|
``` |
|
|
|
|
|
The Project is designed around several scripts that simulate a typical machine learning workflow. Starting with data preparation after preparing data, training model, evaluation and inference model. `google/t5-small` model was being trained on above dataset for `10` epochs. Later inference ran on evaluation data, performance metrics and evaluation results were stored inside `result` subdirectory of `project` directory. |
|
|
|
I added Makefile which can be used to run python scripts separately using following bash commands. |
|
|
|
```bash |
|
make data |
|
make train |
|
make eval |
|
make inference |
|
``` |
|
|
|
`run` is a bash command which can aggregately run entire project. |
|
|
|
```bash |
|
make run |
|
``` |
|
|
|
`clean` is a bash command which can be used to clean the previous runs. |
|
|
|
```bash |
|
make clean |
|
``` |
|
|
|
Performance metrics stores into `performance.json` file inside `results` directory. |
|
|
|
```json |
|
{ |
|
"rouge1": 0.79689240266461, |
|
"rouge2": 0.7606140631154827, |
|
"rougeL": 0.7733855633904199, |
|
"rougeLsum": 0.7734703253159519 |
|
} |
|
``` |
|
|
|
And also, `eval_results.csv` containing predictions of evaluation file. |
|
|
|
| original | compressed | predictions | |
|
|-----------|------------|-------------| |
|
| sentence1 | compress1 | prediction1 | |
|
| sentence2 | compress2 | prediction2 | |
|
| : | : | : | |
|
|
|
### References: |
|
1. https://github.com/google-research-datasets/sentence-compression |
|
2. https://huggingface.co/docs/transformers/en/tasks/summarization |
|
|
|
### Note: |
|
Download trained checkpoint from given drive link [checkpoint](https://drive.google.com/drive/folders/1yrl0VtmM9BtT4aU2Z5vLs6doz35MMxvM?usp=drive_link) |