0xhaz commited on
Commit
8c1aa42
1 Parent(s): 351ed70

Step-by-step instructions for model init

Browse files
Files changed (1) hide show
  1. 📋 BUOD_ Setup.md +126 -0
📋 BUOD_ Setup.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 📋 BUOD: Text Summarization Model for the Filipino Language Documentation and Initialization
2
+ [![Model:distilBART](https://img.shields.io/badge/model-distilBART-green)](https://huggingface.co/jamesesguerra/distilbart-cnn-12-6-finetuned-1.3.1) [![Model:Bert2Bert](https://img.shields.io/badge/model-bert2bert-green)](https://huggingface.co/0xhaz/bert2bert-cnn_dailymail-fp16-finetuned-1.0.0) ![Last Updated](https://img.shields.io/badge/last%20updated%3A-031923-lightgrey)
3
+ Authors: [James Esguerra](https://huggingface.co/jamesesguerra), [Julia Avila](), [Hazielle Bugayong](https://huggingface.co/0xhaz)
4
+
5
+
6
+ > Foreword: This research was done in two parts, gathering the data and running transformer models,
7
+ > namely distilBART and bert2bert. Below is the step-by-step process of the experientaton of the study:
8
+
9
+
10
+ ## 📚 Steps
11
+
12
+ - 📝 **Gathering the data**
13
+ - 🔧 **Initializing the transfomer models; fine-tuning of the models:**
14
+ -- via Google Colab
15
+ -- via Google Colab (Local runtime)
16
+ -- via Jupyter Notebook
17
+
18
+
19
+ ## 📝 Gathering data
20
+
21
+ An [article scraper](https://github.com/jamesesguerra/article_scraper) was used in this experimentation which can gather bodies of text from various news sites. The data gathered was used to pre-train and finetune the models in the next step. This also includes instructions on how to use the article scraper.
22
+
23
+
24
+ ## 🔧 Initialization of transformer models
25
+ #### via Google Colab
26
+ Two models, distilBART and bert2bert were used to compar abstractive text summarization performance. They can be found here:
27
+ - [distilBART](https://colab.research.google.com/drive/1Lv78nHqQh2I7KaFkUzWsn_MXsyP_PP1I?authuser=3#scrollTo=moK3d7mTQ1v-)
28
+ - [bert2bert](https://colab.research.google.com/drive/1Lv78nHqQh2I7KaFkUzWsn_MXsyP_PP1I?authuser=3#scrollTo=moK3d7mTQ1v-)
29
+
30
+
31
+ #### via Google Colab Local Runtime
32
+
33
+ ##### Dependencies
34
+ - Jupyter Notebook
35
+ - Anaconda
36
+ - _Optional:_ CUDA Toolkit for Nvidia, requires an account to install
37
+ - Tensorflow
38
+
39
+ ##### Installing dependencies
40
+ Create an anaconda environment. This can also be used for tensorflow, which links your GPU to Google colab's Local runtime:
41
+
42
+ ```sh
43
+ conda create -n tf-gpu
44
+ conda activate tf-gpu
45
+ ```
46
+
47
+ ##### Optional Step: GPU Utilization (if you are using an external GPU)
48
+
49
+ Next, install the **CUDA toolkit**, this is the version that was used in this experiment. You may find a more compatible version for your hardware:
50
+ ```sh
51
+ conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
52
+ ```
53
+ Then, upgrade pip and install tensorflow:
54
+ ```sh
55
+ pip install –upgrade pip
56
+ pip install “tensorflow<2.11” –user
57
+ ```
58
+
59
+ Now, check if tensorflow has been configured to use the GPU,
60
+ Type in termnial:
61
+ ```sh
62
+ python
63
+ ```
64
+ Next, type the following to verify:
65
+ ```sh
66
+ import tensorflow as tf
67
+ tf.test.is_built_with_cuda()
68
+ ```
69
+
70
+ If it returns `true`, you have succesfully initialized the environment with your external GPU. If not, you may follow the tutorials found here:
71
+
72
+ - CUDA Toolkit Tutorial [here](https://medium.com/geekculture/install-cuda-and-cudnn-on-windows-linux-52d1501a8805)
73
+ - Creating and Anaconda environment [step-by-step](https://stackoverflow.com/questions/51002045/how-to-make-jupyter-notebook-to-run-on-gpu)
74
+ - Installing Tensorflow locally using [this tutorial](https://www.tensorflow.org/install/pip#windows-native_1)
75
+
76
+ ##### Connecting to a Google Colab Local Runtime
77
+ To connect this on a Google Colab Local Runtime, [this tutorial](https://research.google.com/colaboratory/local-runtimes.html) was used.
78
+
79
+ First, install Jupyter notebook (if you haven't) and enable server permissions:
80
+ ```sh
81
+ pip install jupyter_http_over_ws
82
+ jupyter serverextension enable --py jupyter_http_over_ws
83
+ ```
84
+ Next, start and authenticate the server:
85
+ ```sh
86
+ jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888 --NotebookApp.port_retries=0
87
+ ```
88
+ You can now copy the token url and paste it on your Google Colab.
89
+
90
+ #### Running the notebook using Jupyter Notebook
91
+ ##### Dependencies
92
+ - Jupyter Notebook
93
+ - Anaconda
94
+ - _Optional:_ CUDA Toolkit for Nvidia, requires an account to install
95
+ - Tensorflow
96
+
97
+ Download the notebooks and save them in your chosen directory.
98
+ Create an environment where you can run the notebook via Anaconda
99
+ ```sh
100
+ conda create -n env
101
+ conda activate env
102
+ ```
103
+ **You may also opt to install the CUDA toolkit and tensforflow in this environment.
104
+ Next, run the notebooks via Jupyter Notebook.
105
+
106
+ ```sh
107
+ jupyter notebook
108
+ ```
109
+ ##### After you're done
110
+ Deactivate the environment and also disable the server using the commands in your console.
111
+
112
+ ```sh
113
+ conda deactivate
114
+ ```
115
+ ```sh
116
+ jupyter serverextension disable --py jupyter_http_over_ws
117
+ ```
118
+ ## 🔗 Additional Links/ Directory
119
+ Here are some links to resources and or references.
120
+
121
+ | Name | Link |
122
+ | ------ | ------ |
123
+ | Ateneo Social Computing Lab | https://huggingface.co/ateneoscsl |
124
+
125
+
126
+