“ Introducing the first open source, instruction-following LLM”

#16
by dfurman - opened

Why make false claims in the announcement of the model? Google has been doing this for years, the Flan paper came out in 2021. It seriously cuts your credibility among folks who have been in the field before 2023.

I think the claim is specifically about "for commercial use" in the blog, if you read the rest of it. I agree this is an arguable claim, depending on views about what is clearly for commercial use or not. Also, is FLAN an instruction-following model? I think this is also really comparing to other much more similar models derived from LLaMa and OpenAI output.

^ https://huggingface.co/google/flan-t5-xxl

Google's Flan-T5 models are apache 2.0 (commercial use ok), were released in 2021 (a decade ago in AI years), and the datasets employed are open-sourced right in the HF repo (unlike Dolly).

Databricks org

Yeah I agree, see my edit above.

Yes, FLAN is an instruction-following model - check out the main picture in the FLAN repo.

Anecdotally, flan-t5-xl (not even the biggest one) is doing better than this 12b dolly model in my instruction following prompts I typically test out (yes/no reasoning type questions).

Screenshot 2023-04-13 at 8.56.41 AM.png

I think of it as text-to-text, and yeah it was also trained on instruction-following tasks among other things. I agree with you more than I don't, personally. There is a useful claim here, that this is more openly usable in a way that a bunch of LLaMa derivatives are not, that part isn't weird. But the framing of this seems over-broad. I'll put this again to the people in charge of that messaging.

That checks!

Hey Daniel, I’m one of the creators of Dolly and wanted to share some of our thinking on this.

Flan-T5 is really powerful and the Flan dataset particularly so. The thing I observe when using it is for tasks like open ended content generation etc is that it’s very terse. I’ve never been able to get it to write a multi paragraph letter, for example.

My hypothesis is that this reflects the composition of the underlying completion dataset, which is as you mention composed of a lot of benchmark-style responses, eg rate the sentiment, reply w categories or selections from a multiple choice list. To me it seems like it’s great on understanding oriented problems, but not really performant for text generation broadly, which is one of the characteristics I think about when I think of instruction following.

Secondly, I agree, and we debated how to say this without using a million hyphens, but we worked to communicate that the Dolly dataset is the first human-generated open instruction tuning corpus specifically designed to elicit this behavior. To the best of my (admittedly limited) knowledge there are other corpora like OIG, Flan, p3, super natural instructions etc - but they all are either synthetic in the style of self instruct, scraped from the web (as in the case of much of the Flan data) or governed by restrictive licenses.

The last thing we want is to ruin a good time by claiming something that’s not true, and this is a big part of why we go to lengths to emphasize, for example, that the model isn’t state of the art. That said, at least for now I do believe this a first, but like any reasonable scientist remain open to new information as it becomes available.

Thanks for your interest in the project, and hope you find it interesting and useful.

Take care,
Mike

Thanks for your response Mike. Sounds like you improved an aspect of instruction models - little different than claiming to be the first.

Is there a hosted version of this model for testing?

Databricks org

Not right now, but you can just load and use the model in python code as per the model card example

Databricks org

(also please don't attach to unrelated conversations)

@dfurman I can confirm what you said above, on some tasks,
Flan resonated more than Dolly. For my own case, using the Flan is enough and it only uses it for QA and IR.
I'm on the way to fine-tuning it for a closed domain :)

srowen changed discussion status to closed

Sign up or log in to comment