File size: 1,756 Bytes
295a5f5
 
0e1cb51
 
 
295a5f5
 
0e1cb51
295a5f5
0e1cb51
 
295a5f5
 
 
 
 
 
 
0e1cb51
 
 
 
295a5f5
0e1cb51
295a5f5
0e1cb51
295a5f5
0e1cb51
 
 
 
 
 
 
 
 
295a5f5
0e1cb51
295a5f5
0e1cb51
 
 
 
295a5f5
0e1cb51
 
295a5f5
0e1cb51
 
 
 
 
 
295a5f5
0e1cb51
 
 
295a5f5
0e1cb51
295a5f5
0e1cb51
295a5f5
0e1cb51
295a5f5
0e1cb51
 
 
295a5f5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
library_name: transformers
license: mit
language:
- en
---

# Rolema 7B

Rolema 7B is a large language model that works effectively under a 4-bit quantization process.
Rolema 7B is based on the backbone of the Gemma-7B model by Google.

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** Min Si Thu
- **Model type:** Text Generation Large Language Model
- **Language(s) (NLP):** English
- **License:** MIT

### How to use

Installing Libraries

```bash
%%capture 
%pip install -U bitsandbytes 
%pip install -U transformers 
%pip install -U peft 
%pip install -U accelerate 
%pip install -U trl
%pip install -U datasets
```

Code Implementation

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, PeftConfig

base_model = "google/gemma-7b-it"
adapter_model = "jojo-ai-mst/rolema-7b-it"

# Load base model(Gemma 7B-it)
bnbConfig = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(base_model,quantization_config=bnbConfig,) # device_map="auto" autosplit for cuda
model = PeftModel.from_pretrained(model, adapter_model)
tokenizer = AutoTokenizer.from_pretrained(base_model)

model = model.to("cuda")

inputs = tokenizer("How to learn programming", return_tensors="pt")

inputs = inputs.to("cuda")

outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=1000)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])
```