Heavy2Light / README.md
leaBroe's picture
Update README.md
68273b6 verified
metadata
base_model:
  - leaBroe/HeavyBERTa
  - leaBroe/LightGPT
pipeline_tag: translation

Heavy2Light

Heavy2Light is an seq2seq model designed to generate light chain antibody sequences from corresponding heavy chain inputs. It leverages HeavyBERTa as the encoder and LightGPT as the decoder. The model is fine-tuned on paired antibody chain data from the OAS and PLAbDab databases. The model utilizes Adapters for efficient fine-tuning. You can either download the full model weights and adapter from this repository, or directly use the Heavy2Light adapter available in its dedicated directory on Hugging Face.
For more information, please visit our GitHub repository.

How to use the model

from transformers import EncoderDecoderModel, AutoTokenizer, GenerationConfig
from adapters import init

model_path = "leaBroe/Heavy2Light"
subfolder_path = "heavy2light_final_checkpoint"

model = EncoderDecoderModel.from_pretrained(model_path)

tokenizer = AutoTokenizer.from_pretrained(model_path, subfolder=subfolder_path)

init(model)
adapter_name = model.load_adapter("leaBroe/Heavy2Light_adapter", set_active=True)
model.set_active_adapters(adapter_name)

generation_config = GenerationConfig.from_pretrained(model_path)

# example input heavy sequence
heavy_seq = "QLQVQESGPGLVKPSETLSLTCTVSGASSSIKKYYWGWIRQSPGKGLEWIGSIYSSGSTQYNPALGSRVTLSVDTSQTQFSLRLTSVTAADTATYFCARQGADCTDGSCYLNDAFDVWGRGTVVTVSS"

inputs = tokenizer(
    heavy_seq,
    padding="max_length",
    truncation=True,
    max_length=250,
    return_tensors="pt"
)

generated_seq = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    num_return_sequences=1,
    output_scores=True,
    return_dict_in_generate=True,
    generation_config=generation_config,
    bad_words_ids=[[4]],
    do_sample=True,
    temperature=1.0,
)

generated_text = tokenizer.decode(
    generated_seq.sequences[0],
    skip_special_tokens=True,
)

print("Generated light sequence:", generated_text)