DriLLM-Summarizer / README.md

Add colab link

593f8ac verified 4 months ago

3.92 kB

	---
	license: apache-2.0
	datasets:
	- bengsoon/volve_alpaca
	language:
	- en
	base_model:
	- Meta/Meta-Llama-3-8B
	pipeline_tag: summarization
	tags:
	- oil-and-gas
	- energy
	- drilling
	---
	# DriLLM Summarizer

	## Background
	This is a fine-tuned model from [Meta/Meta-Llama-3-8B](https://huggingface.co/Meta/Meta-Llama-3-8B). The model was fine-tuned with [Volve DDR dataset](https://huggingface.co/datasets/bengsoon/volve_alpaca) using the Alpaca template, using [Axolotl](https://github.com/axolotl-ai-cloud/axolotl).

	The motivation behind this model was to fine-tune an LLM that is capable of understanding the nuances of the Drilling Operations and provide 24-hour summarizations based on the inputs from Daily Drilling Reports hourly activities.

	## How to use
	### Sample Colab
	Here's a [Google colab notebook](https://colab.research.google.com/drive/10Txp14M-yeJG3hRAB8U2ydPrWFE1bypW?usp=sharing) where you can get started with using the model

	### Recommended template for DriLLM-Summarizer:
	``` python
	TEMPLATE = """<\|begin_of_text\|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

	### Instruction:
	{instruction}


	### Input:
	{input}


	### Response:

	"""
	```

	### Inferencing using Transformers Pipeline
	The code below was tested on a Google colab (with the free T4 GPU).

	``` python
	import transformers
	import torch

	model_id = "bengsoon/DriLLM-Summarizer"

	pipeline = transformers.pipeline(
	"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
	)

	TEMPLATE = """<\|begin_of_text\|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

	### Instruction:
	{instruction}


	### Input:
	{input}


	### Response:

	"""

	INSTRUCTION = """You are a Rig Supervisor working at an oil and gas offshore drilling operation. \
	Your company is currently on a drilling campaign and you are the on-site Drilling Engineer (DE). \
	As a DE, one of your jobs is to oversee the operations at the drilling rigs. As such, you know the ins and outs of the operation, down to the hourly activities. \
	Every day, activities are recorded either by the Driller, Mud Logger, MWD / LWD engineer or the Drilling Operations Coordinator throughout the day. \
	As a DE representative for your company, you are required to prepare the 24-hour summary for the Daily Drilling Report (DDR) based on the hourly activities reported. \
	You must always maintain the language of report along with the terminologies and mnemonics of the Drilling Engineer. \
	Given the following activities for well XX, please prepare the 24-hour summary for the Daily Drilling Report (DDR). \
	Only return the 24-hour summary, and nothing else.
	"""

	hourly_events = """00:00 - 11:00: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever.
	11:00 - 17:00: Performed are inspection with barge engineer. Cleaned and tidied offices and workspace. Demobilized all personell. End of operation
	"""

	input = TEMPLATE.format(instruction=INSTRUCTION, input=hourly_events)

	output = pipeline(input)

	print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
	# > Response: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
	```

	### Quantized model
	If you are facing GPU constraints, you can try to load it with 8-bit quantization

	``` python
	from transformers import BitsAndBytesConfig

	pipeline = transformers.pipeline(
	"text-generation",
	model=model_id,
	model_kwargs = {
	"torch_dtype": torch.bfloat16,
	"quantization_config": BitsAndBytesConfig(load_in_8bit=True), # Uncomment to use 8-bit quantization,
	},
	device_map="auto"
	)
	```