--- license: afl-3.0 language: - en widget: - text: >- You are a helpful chatbot name StanWhat's my name? example_title: Who am I? - text: >- You are a helpful chatbot name StanHow to make a campfire example_title: Tutorial datasets: - Dahoas/instruct-synthetic-prompt-responses - openai/webgpt_comparisons - squad_v2 - samsum - allenai/soda - xsum - multi_news - cnn_dailymail - scitldr - billsum pipeline_tag: text-generation tags: - finance --- # Supervised Finetuning demonstration Models are finetuned on generated conversation curated from the [Open Assistant](https://github.com/LAION-AI/Open-Assistant). **This model was finetune for only 2,000 iterations, uploaded for ease of sharing only.** # Mixing reward model with sampling We can use reward model to rank the best answer using this example code: ``` import torch from transformers import AutoModelForSequenceClassification from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("theblackcat102/galactica-1.3b-v2") model = AutoModelForCausalLM.from_pretrained("theblackcat102/galactica-1.3b-v2").eval().half().cuda() reward_name = "OpenAssistant/reward-model-deberta-v3-large" rank_model, rank_tokenizer = AutoModelForSequenceClassification.from_pretrained(reward_name), AutoTokenizer.from_pretrained(reward_name) rank_model = rank_model.eval().half().cuda() questions = ["You are a helpful chatbot call AgnesHow do I make a resume?"] for question in questions: inputs = tokenizer(question, return_tensors="pt", padding=True).to(0) if 'token_type_ids' in inputs: inputs.pop('token_type_ids') outputs = model.generate(**inputs, do_sample=True, top_k=60, max_length=220, num_return_sequences=80, early_stopping=True ) print(question) results = [] for i, beam_output in enumerate(outputs): output = tokenizer.decode(beam_output, truncate_before_pattern=[r"\n\n^#", "^'''", "\n\n\n"]) question, answer = output.split('', maxsplit=1) answer = answer.split('')[0].replace('<|endoftext|>', '').lstrip().split('')[0] rank_inputs = rank_tokenizer(question, answer, return_tensors="pt", padding=True, max_length=512, truncation=True).to(1) score = rank_model(**rank_inputs).logits[0].cpu().detach() results.append((answer, score, output)) full_results[question] = results sorted_result = sorted(results, key=lambda x:x[1], reverse=True) total_scores += sorted_result[0][1].item() print('score',sorted_result[0][1].item()) print('-----Best rank-----') print(sorted_result[0][0]) print('-------------------') ``` This work is done under the [Open Assistant](https://github.com/LAION-AI/Open-Assistant) initiative which democratize open source AI assistant. Feel free to join discord and contribute to github! Thanks to [BASIC lab](https://basiclab.lab.nycu.edu.tw/Yummy/index.html#) for compute resource. BASIC Lab is an academic research lab which focuses in multi-modality learning and data mining domain.