theblackcat102's picture
Update README.md
4661287
metadata
license: afl-3.0
language:
  - en
widget:
  - text: <question>What's my name?<answer>
    example_title: Who am I?
  - text: <question>How to make a campfire<answer>
    example_title: Tutorial

Supervised Finetuning demonstration

Models are finetuned on generated conversation curated from the Open Assistant.

Mixing reward model with sampling

We can use reward model to rank the best answer using this example code:

import torch
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-1.3b-base-finetuned/checkpoint-1000")
model = AutoModelForCausalLM.from_pretrained("facebook/galactica-1.3b-base-finetuned/checkpoint-1000").eval().half().cuda()

reward_name = "theblackcat102/electra-large-reward-model"
rank_model, rank_tokenizer = AutoModelForSequenceClassification.from_pretrained(reward_name), AutoTokenizer.from_pretrained(reward_name)
rank_model = rank_model.eval().half().cuda()

questions = ["<question>How do I make a resume?<answer>"]
for question in questions:
    inputs = tokenizer(question, return_tensors="pt", padding=True).to(0)
    if 'token_type_ids' in inputs:
        inputs.pop('token_type_ids')
    outputs = model.generate(**inputs, do_sample=True,
        top_k=60,
        max_length=220,
        num_return_sequences=80, 
        early_stopping=True
    )
    print(question)

    results = []
    for i, beam_output in enumerate(outputs):
        output = tokenizer.decode(beam_output, truncate_before_pattern=[r"\n\n^#", "^'''", "\n\n\n"])
        question, answer = output.split('<answer>', maxsplit=1)
        answer = answer.split('<question>')[0].replace('<|endoftext|>', '').lstrip().split('<answer>')[0]
        rank_inputs = rank_tokenizer(question, answer, return_tensors="pt", padding=True, max_length=512, truncation=True).to(1)
        score = rank_model(**rank_inputs).logits[0].cpu().detach()
        results.append((answer, score, output))
    full_results[question] = results
    sorted_result = sorted(results, key=lambda x:x[1], reverse=True)
    total_scores += sorted_result[0][1].item()
    print('score',sorted_result[0][1].item())
    print('-----Best rank-----')
    print(sorted_result[0][0])
    print('-------------------')

Checkout weights and biases report for training detail.

Thanks to BASIC lab for compute resource. BASIC Lab is an academic research lab which focuses in multi-modality learning and data mining domain.