File size: 4,246 Bytes
7467d8d
1220ff2
7467d8d
 
6b7ff4e
 
7467d8d
 
 
6b7ff4e
7467d8d
 
6b7ff4e
7467d8d
 
6b7ff4e
7467d8d
 
 
 
 
 
 
 
 
1220ff2
549439e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
language:
- en
- fr
thumbnail: "https://www.warpy.io/_next/static/media/tgshell_icon.4fa45b6d.svg"
tags:
- text-generation
- t5
license: "apache-2.0"
datasets:
- wikitext
- bookcorpus
metrics:
- perplexity
- bleu
base_model: "t5"
---

# tgs-model

# Terminal Generative Shell (tgs) Model

## TGS Model implements NL2Bash: Natural Language Interface to Linux Bash



## Citation

If you use this software, please cite it as follows:

```bibtex
@inproceedings{LinWZE2018:NL2Bash,
  author = {Xi Victoria Lin and Chenglong Wang and Luke Zettlemoyer and Michael D. Ernst},
  title = {NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation {LREC} 2018, Miyazaki (Japan), 7-12 May, 2018.},
  year = {2018}
}
```

## Description

This project is based on the NL2Bash dataset, constructed from the GitHub repository [nl2bash](https://github.com/TellinaTool/nl2bash). The objective is to create a semantic parser that translates natural language commands into executable Bash commands using deep learning techniques.

## Installation

### Prerequisites

- Python 3.6+
- PyTorch
- Transformers library
- PyTorch Lightning
- Scikit-learn
- Sentencepiece

### Setup

Clone the repository and install the required packages:

```bash
git clone https://github.com/your-repo/nl2bash.git
cd nl2bash
pip install -r requirements.txt
```

## Usage

1. **Import Libraries**  
   Import all the necessary libraries including PyTorch, Transformers, and PyTorch Lightning.

2. **Load Data**  
   Load the NL2Bash dataset JSON file and inspect its structure.

3. **Preprocess Data**  
   Convert the data into a suitable format for training, including tokenization.

4. **Model Initialization**  
   Initialize the T5 model and tokenizer, and set up the necessary configurations.

5. **Training**  
   Train the model using PyTorch Lightning with specified callbacks and checkpointing.

6. **Validation and Testing**  
   Validate and test the model on the NL2Bash dataset.

7. **Model Inference**  
   Use the trained model to translate natural language commands to Bash commands.

## Example

Here's a quick example to get you started:

```python
from your_module import NL2BashModel, generate_answer
model = NL2BashModel.load_from_checkpoint('path_to_checkpoint')
tokenizer = YourTokenizer.from_pretrained('path_to_tokenizer')
question = "Prints process tree of a current process with id numbers and parent processes."
answer = generate_answer(question, model, tokenizer)
print(answer)
```

## Training Analysis

The T5 model was fine-tuned for the NL2Bash task. The training process showed the following characteristics:

![training_analysis.png](https://github.com/warpy-ai/tgs-model/blob/main/plot.png)

- Training Loss: Demonstrated a consistent decrease over time, indicating effective learning and adaptation to the training data.
- Validation Loss: Also decreased, suggesting good generalization to unseen data.
- Stability: The training process was stable, without significant fluctuations in loss values.
- Overfitting: No evidence of overfitting was observed, as both training and validation losses decreased concurrently.

This analysis provides confidence in the model's ability to learn and generalize from the NL2Bash dataset effectively.

### Bias Analysis

tgs_model `v0.1.0` has a bias towards the NL2Bash dataset. The nl2bashdataset has a huge amount of `find` command on top of the others.

![training_analysis.png](https://camo.githubusercontent.com/3d88e1e35b0b2774ad407760a6718e7975e99ede72d687c8149ee100a8c6a62f/687474703a2f2f766963746f7269616c696e2e6e65742f696d672f6769746875622f6e6c32626173682d7574696c6974792d64697374322e706e67)

For the `v0.2.0` model, we will be treating the NL2Bash dataset as a biased dataset. We will be using the [NL2Bash-2](https://github.com/TellinaTool/nl2bash-2) dataset. The dataset is a more balanced dataset with more commands.

## Contributing

Contributions to improve NL2Bash are welcome. Please read `CONTRIBUTING.md` for guidelines on how to contribute.

## License

This project is licensed under the MIT License - see the `LICENSE` file for details.