---
datasets:
- ai4anshu/sentence-compression
language:
- en
metrics:
- rouge
library_name: transformers
pipeline_tag: summarization
---
## Getting Started

### Installation

1. conda environment
```
conda env create --name NAME --file=environment.yaml
```


The Project is designed around several scripts that simulate a typical machine learning workflow. Starting with data preparation after preparing data, training model, evaluation and inference model. `google/t5-small` model was being trained on above dataset for `10` epochs. Later inference ran on evaluation data, performance metrics and evaluation results were stored inside `result` subdirectory of `project` directory.

I added Makefile which can be used to run python scripts separately using following bash commands.

```bash
make data
make train
make eval
make inference
```

`run` is a bash command which can aggregately run entire project.

```bash
make run
```

`clean` is a bash command which can be used to clean the previous runs.

```bash
make clean
```

Performance metrics stores into `performance.json` file inside `results` directory.

```json
{
    "rouge1": 0.79689240266461,
    "rouge2": 0.7606140631154827,
    "rougeL": 0.7733855633904199,
    "rougeLsum": 0.7734703253159519
}
```

And also, `eval_results.csv` containing predictions of evaluation file.

| original  | compressed | predictions |
|-----------|------------|-------------|
| sentence1 | compress1  | prediction1 |
| sentence2 | compress2  | prediction2 |
| :         | :          | :           |

### References:
1. https://github.com/google-research-datasets/sentence-compression 
2. https://huggingface.co/docs/transformers/en/tasks/summarization 

### Note:
Download trained checkpoint from given drive link [checkpoint](https://drive.google.com/drive/folders/1yrl0VtmM9BtT4aU2Z5vLs6doz35MMxvM?usp=drive_link)