Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

flant5-dolly-xxl caveats

Log in or Sign Up to review the conditions and access this model content.

This model is flan-t5-xxl fine-tuned on databricks-dolly-15k. The model was fine-tuned in mid-april 2023 by Jack Hessel, who is a Research Scientist at AI2. Here's a screenshot of a demo I made with it:

fine-tuning details

Fine-tuning occurred using the t5x library on a v3-128 TPU. Input/output sequence lengths were set to 512/256. ~5K fine-tuning steps were taken with an effective batch size of ~320, with model checkpointing every 100 steps. Adafactor with a learning rate of .0007 was used. The checkpoint with the best balance of validation BLEU/loss was selected. I added very crude support for newline inputs and outputs for t5. Here's how you process the inputs and outputs to get newlines --- my apologies for the crude hack!

text_in = ' '.join(text_in.replace('\n', ' <_LB_> ').replace('\t', ' <_T_> ').split())
outputs = model.generate(inputs["input_ids"], max_new_tokens=256, temperature=temp, top_p=top_p, do_sample=True)
out = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
out = out.replace(' _LB_> ', '\n').replace(' _T_> ', '\t').replace('_LB_>', '\n').replace('_T_> ', '\t').replace('_LB_>', '\n').replace('_T_>', '\t')

why?

The purpose of fine-tuning the model was to better understand how flan-t5 performs when simply fine-tuned on a human-request-style instruction tuning corpus.

Observations:

  • it's possible to make more creative requests now, e.g., "Generate a poem about large language models".
  • Accuracy is variable: the model regularly and confidently outputs falsehoods.
  • the model's outputs are not consistent, e.g., if you generate more than once, you will get contradictory outputs
  • chain-of-thought kind of works, prompts like "you are a ethical and honest language model" kind of affect outputs

why release?

While it is fairly easy to get most (un-RLHFed) language models to output offensive text, there's something jarring about playing with a language model where you can make potentially offensive imperative requests. I hope this model can help folks explore instruction tuned models, and further understand the importance of safeguards.

Downloads last month
0