File size: 1,566 Bytes
99e084c
 
 
 
 
 
 
 
fb4a3c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99e084c
 
fb4a3c6
 
99e084c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: mit
title: Text Summarization
sdk: streamlit
emoji: 🔥
colorFrom: blue
colorTo: purple
---

# Summarization

This project is a machine learning pipeline for natural language processing tasks. It contains a set of scripts and modules that allow you to train and evaluate various models on your own data.

## Description
This repository contains a sample code with aim to demonstrate how to train a model for text summarization. The main focus is to show a basic template on how to create a structure from which we can smoothly deploy the model as well as perform inference on the  trained model.

## Framework used:
* PyTorch
* Transformers

## Project Structure

* `pipeline`
This directory contains the code for the main data pipeline.

- `training_pipeline.py`: Code for the training pipeline.
- `inference_pipeline.py`: Code for the inference pipeline.

 * `steps`
This directory includes various steps involved in the data pipeline.

- `evaluation.py`: Code for evaluating the model.
- `ingest_data.py`: Code for ingesting data into the pipeline.
  - `preprocess.py`: Data preprocessing code.
  - `model_train.py`: Model training code.

* `utils`
This directory contains utility functions used throughout the project.
  - `utils.py`: General utility functions.

* `run_pipeline.py`
This script is the entry point for running the entire data pipeline.

* `Dockerfile`
The Dockerfile for creating a Docker image for this project.

* `requirements.txt`




## License
This project is licensed under the MIT License - see the LICENSE file for details.