metadata
license: afl-3.0
language:
- en
widget:
- text: <question>What's my name?<answer>
example_title: Who am I?
- text: <question>How to make a campfire<answer>
example_title: Tutorial
Supervised Finetuning demonstration
Models are finetuned on generated conversation curated from the Open Assistant.
Mixing reward model with sampling
We can use reward model to rank the best answer using this example code:
import torch
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-1.3b-base-finetuned/checkpoint-1000")
model = AutoModelForCausalLM.from_pretrained("facebook/galactica-1.3b-base-finetuned/checkpoint-1000").eval().half().cuda()
reward_name = "theblackcat102/electra-large-reward-model"
rank_model, rank_tokenizer = AutoModelForSequenceClassification.from_pretrained(reward_name), AutoTokenizer.from_pretrained(reward_name)
rank_model = rank_model.eval().half().cuda()
questions = ["<question>How do I make a resume?<answer>"]
for question in questions:
inputs = tokenizer(question, return_tensors="pt", padding=True).to(0)
if 'token_type_ids' in inputs:
inputs.pop('token_type_ids')
outputs = model.generate(**inputs, do_sample=True,
top_k=60,
max_length=220,
num_return_sequences=80,
early_stopping=True
)
print(question)
results = []
for i, beam_output in enumerate(outputs):
output = tokenizer.decode(beam_output, truncate_before_pattern=[r"\n\n^#", "^'''", "\n\n\n"])
question, answer = output.split('<answer>', maxsplit=1)
answer = answer.split('<question>')[0].replace('<|endoftext|>', '').lstrip().split('<answer>')[0]
rank_inputs = rank_tokenizer(question, answer, return_tensors="pt", padding=True, max_length=512, truncation=True).to(1)
score = rank_model(**rank_inputs).logits[0].cpu().detach()
results.append((answer, score, output))
full_results[question] = results
sorted_result = sorted(results, key=lambda x:x[1], reverse=True)
total_scores += sorted_result[0][1].item()
print('score',sorted_result[0][1].item())
print('-----Best rank-----')
print(sorted_result[0][0])
print('-------------------')
Checkout weights and biases report for training detail.
Thanks to BASIC lab for compute resource. BASIC Lab is an academic research lab which focuses in multi-modality learning and data mining domain.