monsoon-nlp commited on
Commit
4a226bb
1 Parent(s): 3bb7c4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -41,6 +41,49 @@ Write information about the nucleotide sequence.
41
  Information about location in the kaniwa chromosome: >lcl|Cp5
42
  ```
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
45
 
46
 
 
41
  Information about location in the kaniwa chromosome: >lcl|Cp5
42
  ```
43
 
44
+ ## Usage
45
+
46
+ ### Basic inference
47
+
48
+ ```python
49
+ from peft import AutoPeftModelForCausalLM
50
+ from transformers import AutoTokenizer
51
+
52
+ model = AutoPeftModelForCausalLM.from_pretrained("monsoon-nlp/llama3-biotokenpretrain-kaniwa", load_in_4bit=True).to("cuda")
53
+ tokenizer = AutoTokenizer.from_pretrained("monsoon-nlp/llama3-biotokenpretrain-kaniwa")
54
+ tokenizer.pad_token = tokenizer.eos_token # pad fix
55
+
56
+ qed = "∎" # from math symbols, used in pretraining
57
+ sequence = "".join([(qed + nt) for nt in "GCCTATAGTGTGTAGCTAATGAGCCTAGGTTATCGACCCTAATCT"])
58
+
59
+ inputs = tokenizer(f"{prefix}{sequence}{annotation}", return_tensors="pt")
60
+ outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=50)
61
+ sample = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]
62
+ ```
63
+
64
+ ### LoRA finetuning on a new task
65
+
66
+ ```python
67
+ from trl import SFTTrainer
68
+ from unsloth import FastLanguageModel
69
+
70
+ model, tokenizer = FastLanguageModel.from_pretrained(
71
+ model_name = "monsoon-nlp/llama3-biotokenpretrain-kaniwa",
72
+ max_seq_length = 7_000, # max 6,000 bp for AgroNT tasks
73
+ dtype = None,
74
+ load_in_4bit = True,
75
+ resize_model_vocab=128260, # includes biotokens
76
+ )
77
+ tokenizer.pad_token = tokenizer.eos_token # pad fix
78
+
79
+ trainer = SFTTrainer(
80
+ model = model,
81
+ tokenizer = tokenizer,
82
+ ...
83
+ )
84
+ ```
85
+
86
+
87
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
88
 
89