Text Generation
Transformers
Safetensors
English
mistral
conversational
Inference Endpoints
text-generation-inference

Dolphin 3.0 with FIM

#8
by ranguna - opened

Hello!

Awesome work on this and your other recent models, they are my go to models for almost everything at the moment.

I can't wait for dolphin 3.0, specially the agent and function calling features. But I'd like to ask if it would also be possible to include FIM (fill in the middle) feature?

FIM would be fenomenal for coding, where you can give the full context of a file and just generate code for a specific position of said file.

Stable code from stabilityai includes this feature as well, here's a section about their training data: https://huggingface.co/stabilityai/stable-code-3b#training-dataset.

Thanks!

Hi! You can actually do it using lmql for it.
Take a look on their docs.
And it becomes far less expensive in terms of token generation

First time I've heard of this, looks pretty cool

Will give it a try, thanks!

ranguna changed discussion status to closed

So I took a look at lmql, and I'm not quite sure how it would help in substituting FIM. Taking a look at the explanation at minute 11:25 from https://play.vidyard.com/oS4f5GD7utU5smxYkcwBCU.html?, if I were to use the following query:

argmax
  "{code before cursor}"
  "[FIM]"
  "{code after cursor}"
...

The model would actually only receive the first static part of the prompt (code before cursor) and not the full prompt (code before and after the cursor).
If you have some time, could you explain how you would use lmql to do FIM-like inference ?

LMQL seems good to build structured output, like generating json or even function calling by using constraints as the functions names, but doesn't seem too useful for FIM, unless I've missed something.

ranguna changed discussion status to open

The point is: is your content around the FIM chunk static or not?
If it isn't static, then only by fine-tuning or trying to do fewshot.

Yep, the content around the FIM is static. For the coding example, it would be the content before and after the cursor. Ex:

def function_that_prints[FIM]():
  print("hello")

In this case, the model would return "_hello" or something similar.

I would try to combine few shot with this:

https://lmql.ai/blog/

Never tried it though.. my use cases are json processing in general... and parse out what I need from that json and combine it using python

What about trying to restate it as a json parsing problem and using python to merge the result of it?

I did exactly that; I parsed the model response in every possible way to see if there is a parsable substring, and it works in 97% of cases even on very complex contexts.

However, dolphin-3.0 for agents sounds amazing:)

Great hehehe... so I am not crazy at all.
And yeah, it will be amazing having strong agent capabilities.

We are just trying to push the models farther with new stuff besides data. That's why it is taking longer. Stay tuned.

Hmmm... This all makes sense for generating structured responses, such as json. We can analyse the logits to see which token with the highest score would generate a valid json response. But each inference only has context of the previous content, not the next content, at least from what I understood.

I could do few shot and include a few examples in the prompt, but that would limit the context I can provide to the model, since some of it would be used for the examples. Moreover I'm not sure if it would be as performant if it was actually trained with FIM data.

It's ok if you don't see value in this type of training, I can try creating my own dataset and fine tune on my side.

About restating as a json problem, potentially something like this could make sense:

{
  "code-before-cursor":"{code before the cursor}",  
  "code-after-cursor":"{code after the cursor}", 
  "code-in-the-middle":"[FIM]" 
} 

But yeah, not sure it would be good compared with a model that is actually trained on FIM data.

I'll try to give this a try soon and compare against the stable code 3B model.

My opinion is that this is way too specific for general purpose use. This just increase the model sft complexity. If I were you, I would try sft further dolphin with you special needs

fernandofernandes changed discussion status to closed

Thanks for the feedback, I'll also give stable code a run, might be a better fit specifically for code completion.

Thanks again!

Sign up or log in to comment