📋 BUOD: Text Summarization Model for the Filipino Language Documentation and Initialization

Authors: James Esguerra, Julia Avila, Hazielle Bugayong

Foreword: This research was done in two parts, gathering the data and running transformer models, namely distilBART and bert2bert. Below is the step-by-step process of the experientaton of the study:

📚 Steps

📝 Gathering the data
🔧 Initializing the transfomer models; fine-tuning of the models: -- via Google Colab -- via Google Colab (Local runtime) -- via Jupyter Notebook

📝 Gathering data

An article scraper was used in this experimentation which can gather bodies of text from various news sites. The data gathered was used to pre-train and finetune the models in the next step. This also includes instructions on how to use the article scraper.

🔧 Initialization of transformer models

via Google Colab

Two models, distilBART and bert2bert were used to compar abstractive text summarization performance. They can be found here:

via Google Colab Local Runtime

Dependencies

Jupyter Notebook
Anaconda
Optional: CUDA Toolkit for Nvidia, requires an account to install
Tensorflow

Installing dependencies

Create an anaconda environment. This can also be used for tensorflow, which links your GPU to Google colab's Local runtime:

conda create -n tf-gpu
conda activate tf-gpu

Optional Step: GPU Utilization (if you are using an external GPU)

Next, install the CUDA toolkit, this is the version that was used in this experiment. You may find a more compatible version for your hardware:

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0

Then, upgrade pip and install tensorflow:

pip install –upgrade pip
pip install “tensorflow<2.11” –user

Now, check if tensorflow has been configured to use the GPU, Type in termnial:

python

Next, type the following to verify:

import tensorflow as tf
tf.test.is_built_with_cuda()

If it returns true, you have succesfully initialized the environment with your external GPU. If not, you may follow the tutorials found here:

CUDA Toolkit Tutorial here
Creating and Anaconda environment step-by-step
Installing Tensorflow locally using this tutorial

Connecting to a Google Colab Local Runtime

To connect this on a Google Colab Local Runtime, this tutorial was used.

First, install Jupyter notebook (if you haven't) and enable server permissions:

pip install jupyter_http_over_ws
jupyter serverextension enable --py jupyter_http_over_ws

Next, start and authenticate the server:

jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888  --NotebookApp.port_retries=0

You can now copy the token url and paste it on your Google Colab.

Running the notebook using Jupyter Notebook

Dependencies

Jupyter Notebook
Anaconda
Optional: CUDA Toolkit for Nvidia, requires an account to install
Tensorflow

Download the notebooks and save them in your chosen directory. Create an environment where you can run the notebook via Anaconda

conda create -n env
conda activate env

**You may also opt to install the CUDA toolkit and tensforflow in this environment. Next, run the notebooks via Jupyter Notebook.

jupyter notebook

After you're done

Deactivate the environment and also disable the server using the commands in your console.

conda deactivate

jupyter serverextension disable --py jupyter_http_over_ws

🔗 Additional Links/ Directory

Here are some links to resources and or references.

Name	Link
Ateneo Social Computing Lab	https://huggingface.co/ateneoscsl