Note : The adapter and related GLaDOS code is licensed under Apache 2.0- however the base model is licensed under bigcode-openrail-m. Since this adapter utilizes the base model, you still must adhere to the openrail license. As such I have marked openrail as the license for this model, since it effectively is.
GLaDOS speaks Markdown!
Usage
To use this model, you must first navigate to the bigcode starcoder model and accept their license, then create a token for your account and update the below code with it.
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
# Setup Model
path = "JamesConley/glados_starcoder"
token = PUT YOUR TOKEN HERE!!!!!!!
config = PeftConfig.from_pretrained(path)
base_model_path = config.base_model_name_or_path
model = AutoModelForCausalLM.from_pretrained(base_model_path, torch_dtype=torch.float16, device_map="auto", use_auth_token=token)
model = PeftModel.from_pretrained(model, path, device_map="auto")
# Setup Tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_path, truncation_side="left", use_auth_token=token)
# Encode Tokens
text = """User :\nWrite a python function that trains a classifier on data loaded from a csv.\nGLaDOS :\n"""
input_ids = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048).input_ids
# Move tokens to GPU
#input_ids = input_ids.to("cuda")
# Perform Inference
with torch.no_grad():
with torch.cuda.amp.autocast():
gen_tokens = model.generate(
input_ids=input_ids, max_new_tokens=256
)
# Decode Tokens
gen_text = tokenizer.batch_decode(gen_tokens)
Result Note : The slash \ characters are not actually in the model output, I have added them to prevent the model card from rendering the codeblock incorrectly.
User :
Write a python function that trains a classifier on data loaded from a csv.
GLaDOS :
Here is an example of a function that trains a classifier on data loaded from a CSV file:
\```
python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
def train_classifier(data_file, target_column, test_size=0.2):
# Load the data from the CSV file
data = pd.read_csv(data_file)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop(target_column, axis=1),
data[target_column],
test_size=test_size)
# Train the classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
# Return the trained classifier and the test set predictions
return clf, clf.predict(X_test)
\```
This function takes in the following arguments:
* `data_file`: the path to the CSV file containing the data
* `target_column`: the name of the column in the CSV file that contains the target variable
Markdown rendered output :
User : Write a python function that trains a classifier on data loaded from a csv. GLaDOS : Here is an example of a function that trains a classifier on data loaded from a CSV file:
python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
def train_classifier(data_file, target_column, test_size=0.2):
# Load the data from the CSV file
data = pd.read_csv(data_file)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop(target_column, axis=1),
data[target_column],
test_size=test_size)
# Train the classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
# Return the trained classifier and the test set predictions
return clf, clf.predict(X_test)
This function takes in the following arguments:
data_file
: the path to the CSV file containing the datatarget_column
: the name of the column in the CSV file that contains the target variable