DeciDPObyBB - a 7b DeciLM Finetune using DPO
Built by fine-tuning DeciLM-7B-Insruct using Intel Orca DPO Pairs
created by bhaiyabot
built for research and learning purposes!
usage:
message = [
{"role": "system", "content": "You are a very helpful assistant chatbot that thinks step by step"},
{"role": "user", "content": input}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
sequences = pipeline(
prompt,
do_sample=True,
temperature=1,
num_beams=5,
max_length=1000,
pad_token_id=tokenizer.eos_token_id,
)
print(sequences[0]['generated_text'])
@misc{DeciFoundationModels,
title = {DeciLM-7B-instruct},
author = {DeciAI Research Team},
year = {2023}
url={https://huggingface.co/Deci/DeciLM-7B-instruct},
}
@misc{rafailov2023direct,
title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model},
author={Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn},
year={2023},
eprint={2305.18290},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
more details to come soon
- Downloads last month
- 3
Inference API (serverless) does not yet support model repos that contain custom code.