monsoon-nlp commited on
Commit
1b20ca9
1 Parent(s): 0fed510

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -12
README.md CHANGED
@@ -7,26 +7,35 @@ base_model: monsoon-nlp/tinyllama-mixpretrain-quinoa-sciphi
7
  model-index:
8
  - name: tinyllama-mixpretrain-uniprottune
9
  results: []
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
  # tinyllama-mixpretrain-uniprottune
16
 
17
- This model is a fine-tuned version of [monsoon-nlp/tinyllama-mixpretrain-quinoa-sciphi](https://huggingface.co/monsoon-nlp/tinyllama-mixpretrain-quinoa-sciphi) on the None dataset.
 
 
 
18
 
19
- ## Model description
 
 
20
 
21
- More information needed
 
 
 
22
 
23
- ## Intended uses & limitations
 
 
 
24
 
25
- More information needed
26
 
27
- ## Training and evaluation data
28
 
29
- More information needed
30
 
31
  ## Training procedure
32
 
@@ -42,8 +51,6 @@ The following hyperparameters were used during training:
42
  - lr_scheduler_warmup_steps: 10
43
  - num_epochs: 1
44
 
45
- ### Training results
46
-
47
 
48
 
49
  ### Framework versions
 
7
  model-index:
8
  - name: tinyllama-mixpretrain-uniprottune
9
  results: []
10
+ datasets:
11
+ - monsoon-nlp/greenbeing-proteins
12
  ---
13
 
 
 
 
14
  # tinyllama-mixpretrain-uniprottune
15
 
16
+ This is an adapter of the [monsoon-nlp/tinyllama-mixpretrain-quinoa-sciphi](https://huggingface.co/monsoon-nlp/tinyllama-mixpretrain-quinoa-sciphi)
17
+ model on the GreenBeing dataset finetuning split (minus maize/corn/*Zea*, which I left for evaluation).
18
+
19
+ ## Usage
20
 
21
+ ```
22
+ from peft import AutoPeftModelForCausalLM
23
+ from transformers import AutoTokenizer
24
 
25
+ # this model
26
+ model = AutoPeftModelForCausalLM.from_pretrained("monsoon-nlp/tinyllama-mixpretrain-uniprottune").to("cuda")
27
+ # base model for the tokenizer
28
+ tokenizer = AutoTokenizer.from_pretrained("monsoon-nlp/tinyllama-mixpretrain-quinoa-sciphi")
29
 
30
+ inputs = tokenizer("<sequence> Subcellular locations:", return_tensors="pt")
31
+ outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=50)
32
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
33
+ ```
34
 
35
+ Inference Notebook: https://colab.research.google.com/drive/1UTavcVpqWkp4C_GkkS_HxDQ0Orpw43iu?usp=sharing
36
 
37
+ It seems unreliable on the *Zea* proteins. Getting a lot of the same answers for Subcellular locations.
38
 
 
39
 
40
  ## Training procedure
41
 
 
51
  - lr_scheduler_warmup_steps: 10
52
  - num_epochs: 1
53
 
 
 
54
 
55
 
56
  ### Framework versions