added readme in code
Browse files- code/README.md +4 -0
code/README.md
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
```myocr.py``` is responsible for scrapping all the writings of Mahatma Gandhi.
|
2 |
+
|
3 |
+
|
4 |
+
```data_preprocessing.py``` does the data cleaning, and prepares a file which is ready to be inputted into the gpt-2 finetuning pipeline. In this code, we have set the threshold of 200 i.e., paragraphs whose number of token_ids are > 200, they will be split in half (recursively).
|