File size: 4,202 Bytes
a73196a
 
 
 
 
 
 
c87ed0b
 
 
066e53e
c87ed0b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0da11a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
license: bigcode-openrail-m
---
Note : The adapter and related GLaDOS code is licensed under Apache 2.0- however the base model is licensed under bigcode-openrail-m. Since this adapter utilizes the base model, you still must adhere to the openrail license. 
As such I have marked openrail as the license for this model, since it _effectively_ is.


GLaDOS speaks Markdown!

Usage

To use this model, you must first navigate to the bigcode starcoder model and accept their license, then create a token for your account and update the below code with it.
```
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# Setup Model
path = "JamesConley/glados_starcoder"
token = PUT YOUR TOKEN HERE!!!!!!!
config = PeftConfig.from_pretrained(path)
base_model_path = config.base_model_name_or_path
model = AutoModelForCausalLM.from_pretrained(base_model_path, torch_dtype=torch.float16, device_map="auto", use_auth_token=token)
model = PeftModel.from_pretrained(model, path, device_map="auto")

# Setup Tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_path, truncation_side="left", use_auth_token=token)

# Encode Tokens
text = """User :\nWrite a python function that trains a classifier on data loaded from a csv.\nGLaDOS :\n"""
input_ids = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048).input_ids

# Move tokens to GPU
#input_ids = input_ids.to("cuda")

# Perform Inference
with torch.no_grad():
    with torch.cuda.amp.autocast():
        gen_tokens = model.generate(
            input_ids=input_ids, max_new_tokens=256
        )

# Decode Tokens
gen_text = tokenizer.batch_decode(gen_tokens)
```

Result 
Note : The slash \ characters are not actually in the model output, I have added them to prevent the model card from rendering the codeblock incorrectly.
```
User :
Write a python function that trains a classifier on data loaded from a csv.
GLaDOS :
Here is an example of a function that trains a classifier on data loaded from a CSV file:

\```
python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

def train_classifier(data_file, target_column, test_size=0.2):
    # Load the data from the CSV file
    data = pd.read_csv(data_file)

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(data.drop(target_column, axis=1),
                                                        data[target_column],
                                                        test_size=test_size)

    # Train the classifier
    clf = RandomForestClassifier()
    clf.fit(X_train, y_train)

    # Return the trained classifier and the test set predictions
    return clf, clf.predict(X_test)

\```
This function takes in the following arguments:

* `data_file`: the path to the CSV file containing the data
* `target_column`: the name of the column in the CSV file that contains the target variable
```

Markdown rendered output : 


User :
Write a python function that trains a classifier on data loaded from a csv.
GLaDOS :
Here is an example of a function that trains a classifier on data loaded from a CSV file:

```
python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

def train_classifier(data_file, target_column, test_size=0.2):
    # Load the data from the CSV file
    data = pd.read_csv(data_file)

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(data.drop(target_column, axis=1),
                                                        data[target_column],
                                                        test_size=test_size)

    # Train the classifier
    clf = RandomForestClassifier()
    clf.fit(X_train, y_train)

    # Return the trained classifier and the test set predictions
    return clf, clf.predict(X_test)

```
This function takes in the following arguments:

* `data_file`: the path to the CSV file containing the data
* `target_column`: the name of the column in the CSV file that contains the target variable