DriLLM-Summarizer / README.md
bengsoon's picture
Add colab link
593f8ac verified
---
license: apache-2.0
datasets:
- bengsoon/volve_alpaca
language:
- en
base_model:
- Meta/Meta-Llama-3-8B
pipeline_tag: summarization
tags:
- oil-and-gas
- energy
- drilling
---
# DriLLM Summarizer
## Background
This is a fine-tuned model from [Meta/Meta-Llama-3-8B](https://huggingface.co/Meta/Meta-Llama-3-8B). The model was fine-tuned with [Volve DDR dataset](https://huggingface.co/datasets/bengsoon/volve_alpaca) using the Alpaca template, using [Axolotl](https://github.com/axolotl-ai-cloud/axolotl).
The motivation behind this model was to fine-tune an LLM that is capable of understanding the nuances of the Drilling Operations and provide 24-hour summarizations based on the inputs from Daily Drilling Reports hourly activities.
## How to use
### Sample Colab
Here's a [Google colab notebook](https://colab.research.google.com/drive/10Txp14M-yeJG3hRAB8U2ydPrWFE1bypW?usp=sharing) where you can get started with using the model
### Recommended template for DriLLM-Summarizer:
``` python
TEMPLATE = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
"""
```
### Inferencing using Transformers Pipeline
The code below was tested on a Google colab (with the free T4 GPU).
``` python
import transformers
import torch
model_id = "bengsoon/DriLLM-Summarizer"
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
TEMPLATE = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
"""
INSTRUCTION = """You are a Rig Supervisor working at an oil and gas offshore drilling operation. \
Your company is currently on a drilling campaign and you are the on-site Drilling Engineer (DE). \
As a DE, one of your jobs is to oversee the operations at the drilling rigs. As such, you know the ins and outs of the operation, down to the hourly activities. \
Every day, activities are recorded either by the Driller, Mud Logger, MWD / LWD engineer or the Drilling Operations Coordinator throughout the day. \
As a DE representative for your company, you are required to prepare the 24-hour summary for the Daily Drilling Report (DDR) based on the hourly activities reported. \
You must always maintain the language of report along with the terminologies and mnemonics of the Drilling Engineer. \
Given the following activities for well XX, please prepare the 24-hour summary for the Daily Drilling Report (DDR). \
Only return the 24-hour summary, and nothing else.
"""
hourly_events = """00:00 - 11:00: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever.
11:00 - 17:00: Performed are inspection with barge engineer. Cleaned and tidied offices and workspace. Demobilized all personell. End of operation
"""
input = TEMPLATE.format(instruction=INSTRUCTION, input=hourly_events)
output = pipeline(input)
print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
# > Response: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
```
### Quantized model
If you are facing GPU constraints, you can try to load it with 8-bit quantization
``` python
from transformers import BitsAndBytesConfig
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs = {
"torch_dtype": torch.bfloat16,
"quantization_config": BitsAndBytesConfig(load_in_8bit=True), # Uncomment to use 8-bit quantization,
},
device_map="auto"
)
```