JamesConley/glados_starcoder

Note : The adapter and related GLaDOS code is licensed under Apache 2.0- however the base model is licensed under bigcode-openrail-m. Since this adapter utilizes the base model, you still must adhere to the openrail license. As such I have marked openrail as the license for this model, since it effectively is.

GLaDOS speaks Markdown!

Usage

To use this model, you must first navigate to the bigcode starcoder model and accept their license, then create a token for your account and update the below code with it.

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# Setup Model
path = "JamesConley/glados_starcoder"
token = PUT YOUR TOKEN HERE!!!!!!!
config = PeftConfig.from_pretrained(path)
base_model_path = config.base_model_name_or_path
model = AutoModelForCausalLM.from_pretrained(base_model_path, torch_dtype=torch.float16, device_map="auto", use_auth_token=token)
model = PeftModel.from_pretrained(model, path, device_map="auto")

# Setup Tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_path, truncation_side="left", use_auth_token=token)

# Encode Tokens
text = """User :\nWrite a python function that trains a classifier on data loaded from a csv.\nGLaDOS :\n"""
input_ids = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048).input_ids

# Move tokens to GPU
#input_ids = input_ids.to("cuda")

# Perform Inference
with torch.no_grad():
    with torch.cuda.amp.autocast():
        gen_tokens = model.generate(
            input_ids=input_ids, max_new_tokens=256
        )

# Decode Tokens
gen_text = tokenizer.batch_decode(gen_tokens)

Result Note : The slash \ characters are not actually in the model output, I have added them to prevent the model card from rendering the codeblock incorrectly.

User :
Write a python function that trains a classifier on data loaded from a csv.
GLaDOS :
Here is an example of a function that trains a classifier on data loaded from a CSV file:

\```
python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

def train_classifier(data_file, target_column, test_size=0.2):
    # Load the data from the CSV file
    data = pd.read_csv(data_file)

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(data.drop(target_column, axis=1),
                                                        data[target_column],
                                                        test_size=test_size)

    # Train the classifier
    clf = RandomForestClassifier()
    clf.fit(X_train, y_train)

    # Return the trained classifier and the test set predictions
    return clf, clf.predict(X_test)

\```
This function takes in the following arguments:

* `data_file`: the path to the CSV file containing the data
* `target_column`: the name of the column in the CSV file that contains the target variable

Markdown rendered output :

User : Write a python function that trains a classifier on data loaded from a csv. GLaDOS : Here is an example of a function that trains a classifier on data loaded from a CSV file:

python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

def train_classifier(data_file, target_column, test_size=0.2):
    # Load the data from the CSV file
    data = pd.read_csv(data_file)

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(data.drop(target_column, axis=1),
                                                        data[target_column],
                                                        test_size=test_size)

    # Train the classifier
    clf = RandomForestClassifier()
    clf.fit(X_train, y_train)

    # Return the trained classifier and the test set predictions
    return clf, clf.predict(X_test)

This function takes in the following arguments:

data_file: the path to the CSV file containing the data
target_column: the name of the column in the CSV file that contains the target variable