ai4anshu's picture
Update README.md
ed33d02 verified
---
datasets:
- ai4anshu/sentence-compression
language:
- en
metrics:
- rouge
library_name: transformers
pipeline_tag: summarization
---
## Getting Started
### Installation
1. conda environment
```
conda env create --name NAME --file=environment.yaml
```
The Project is designed around several scripts that simulate a typical machine learning workflow. Starting with data preparation after preparing data, training model, evaluation and inference model. `google/t5-small` model was being trained on above dataset for `10` epochs. Later inference ran on evaluation data, performance metrics and evaluation results were stored inside `result` subdirectory of `project` directory.
I added Makefile which can be used to run python scripts separately using following bash commands.
```bash
make data
make train
make eval
make inference
```
`run` is a bash command which can aggregately run entire project.
```bash
make run
```
`clean` is a bash command which can be used to clean the previous runs.
```bash
make clean
```
Performance metrics stores into `performance.json` file inside `results` directory.
```json
{
"rouge1": 0.79689240266461,
"rouge2": 0.7606140631154827,
"rougeL": 0.7733855633904199,
"rougeLsum": 0.7734703253159519
}
```
And also, `eval_results.csv` containing predictions of evaluation file.
| original | compressed | predictions |
|-----------|------------|-------------|
| sentence1 | compress1 | prediction1 |
| sentence2 | compress2 | prediction2 |
| : | : | : |
### References:
1. https://github.com/google-research-datasets/sentence-compression
2. https://huggingface.co/docs/transformers/en/tasks/summarization
### Note:
Download trained checkpoint from given drive link [checkpoint](https://drive.google.com/drive/folders/1yrl0VtmM9BtT4aU2Z5vLs6doz35MMxvM?usp=drive_link)