MeLM / README.md
mjbuehler's picture
Update README.md
611ca8c
metadata
license: apache-2.0

language: - en tags: - materials science - mechanics - solids - MechGPT - scientific AI - machine learning - generative AI

Language Modeling Strategies for Mechanics and Materials (MeLM)

For centuries, researchers have sought out ways to connect disparate areas of knowledge. While early scientists and engineers (e.g. Galileo, da Vinci, and others) were often scholars across fields, specialization has taken hold later. Now, with the advent of AI we can explore deep relationships across areas that venture to connect technical disciplines (e.g., mechanics and chemistry) or general domains of knowledge (e.g., failure mechanics and art). Here we propose a workflow to develop a fine-tuned Large Language Model (LLM), exemplified for a subset of knowledge in materials failure and multiscale modeling, and discuss its application in various use cases. The modeling strategy includes the use of general-purpose LLMs to extract question-answer pairs from raw data followed by fine-tuning a LLM. The resulting MechGPT LLM is used in a series of computational experiments to explore its capacity for knowledge retrieval, language tasks, hypothesis generation, and connecting knowledge across disparate areas of science. We further explore the use of LLMs to generate ontological knowledge graphs, or ologs, to elucidate mechanistic, interpretable graph structures that provide explanatory insights, frameworks for new research questions, and visual representations of knowledge. This work shows the potential of LLMs to complement the way we model problems in mechanics and materials, enabling faster, more efficient, and more accurate research and engineering. The flexible multi-stage training strategy is transferrable and offers a path to obtain other fine-tuned models in other fields of mechanics. Three versions of MechGPT are discussed, featuring different sizes from13 billion to 70 billion parameters, and reaching context lengths of more than 10,000 tokens.

image/png

This repository also features codes for the multi-modal mechanics language model, MeLM, applied to solve various nonlinear forward and inverse problems, that can deal with a set of instructions, numbers and microstructure data. The framework is applied to various examples including bio-inspired hierarchical honeycomb design, carbon nanotube mechanics, and protein unfolding. In spite of the flexible nature of the model–which allows us to easily incorporate diverse materials, scales, and mechanical features–the model performs well across disparate forward and inverse tasks. Based on an autoregressive attention-model, MeLM effectively represents a large multi-particle system consisting of hundreds of millions of neurons, where the interaction potentials are discovered through graph-forming self-attention mechanisms that are then used to identify relationships from emergent structures, while taking advantage of synergies discovered in the training data. We show that the model can solve complex degenerate mechanics design problems and determine novel material architectures across a range of hierarchical levels, providing an avenue for materials discovery and analysis. To illustrate the use case for broader possibilities, we outline a human-machine interactive MechGPT model, here trained on a set of 1,103 Wikipedia articles related to mechanics, showing how the general framework can be used not only to solve forward and inverse problems but in addition, for complex language tasks like summarization, generation of new research concepts, and knowledge extraction. Looking beyond the demonstrations reported in this paper, we discuss other opportunities in applied mechanics and general considerations about the use of large language models in modeling, design, and analysis that can span a broad spectrum of material properties from mechanical, thermal, optical, to electronic.

MechGPT model: Mechanics-focused LLM foundation model for knowledge retrieval, natural language tasks, hypothesis generation, and connecting knowledge across disparate areas

Load quantized model using PEFT/LoRA adapter

from transformers import AutoModelForSeq2SeqLM
from peft import PeftModel, PeftConfig
from transformers import BitsAndBytesConfig
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
import numpy as np
 
from threading import Thread
from typing import Iterator
from transformers import  TextIteratorStreamer

from transformers import GenerationConfig
import gradio as gr
 
from threading import Thread
from typing import Iterator
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer

model_name='Open-Orca/OpenOrca-Platypus2-13B'
FT_model_name = 'MechGPT-13b_v106C'

peft_model_id = f'{FT_model_name}'

bnb_config4bit = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model_base = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    quantization_config= bnb_config4bit,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
model_base.config.use_cache = False
model = PeftModel.from_pretrained(model_base, peft_model_id, 
                             )
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" 

Inference:

device='cuda'
def generate_response (text_input="Mechanics is a powerful discipline with many applications, such as ",
                      num_return_sequences=1,
                      temperature=0.4, #the higher the temperature, the more creative the model becomes
                      max_new_tokens=128,
                      num_beams=1,
                      top_k = 50,
                      top_p = 0.9,
                      repetition_penalty=1.,eos_token_id=2,verbatim=False,
                     ):

    inputs = tokenizer.encode(text_input,  add_special_tokens  =False,  return_tensors ='pt')
    if verbatim:
        print ("Length of input, tokenized: ", inputs.shape)
    with torch.no_grad():
          outputs = model.generate(input_ids=inputs.to(device), 
                                   max_new_tokens=max_new_tokens,
                                   temperature=temperature, 
                                   num_beams=num_beams,
                                   top_k = top_k,
                                   top_p =top_p,
                                   num_return_sequences = num_return_sequences, eos_token_id=eos_token_id,
                                   do_sample =True,
                                   repetition_penalty=repetition_penalty,
                                  )
    return tokenizer.batch_decode(outputs[:,inputs.shape[1]:].detach().cpu().numpy(), skip_special_tokens=True)

Prompt template:

# Single-turn `OpenChat Llama2 V1`
tokenize("You are MechGPT.<|end_of_turn|>User: Hello<|end_of_turn|>Assistant:")

# Multi-turn `OpenChat Llama2 V1`
tokenize("You are MechGPT.<|end_of_turn|>User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
generate_response (    text_input="You are MechGPT.<|end_of_turn|>User: How does hyperelastic softening affect crack speed in brittle materials?<|end_of_turn|>Assistant:",
                       max_new_tokens=128,
                       temperature=0.3, #value used to modulate the next token probabilities.
                       num_beams=1,
                       top_k = 50,
                       top_p = 0.9,
                       num_return_sequences = 1, eos_token_id=[2, 32000],
                       )

Dataset:

See: https://huggingface.co/datasets/lamm-mit/MechanicsMaterials