Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Sentdex 
posted an update Feb 20
Post
Working through the Reddit dataset, one thing that occurs to me is we pretty much always train LLMs to be a conversation between 2 parties like Bot/Human or Instruction/Response.

It seems far more common with internet data that we have multi-speaker/group discussions with a dynamic number of speakers. This also seems to be more realistic to the real world too and requires a bit more understanding to model.

Is there some research into this? I have some ideas of how I'd like to implement it, but I wonder if some work has already been done here?

Aya-101 can help

I was thinking exactly the same thing when ChatGPT first came out! I have run some minor experiments with causal language modeling by having a fixed number of users/speakers and then instruct fine-tuning the base/foundational model. "Dynamic number of speakers" sounds interesting, though! Maybe there is a clever way to inject new tokens into the vocabulary to achieve this.

Would love to contribute tothis initiative.

I would imagine a method similar to Mistral's router could work (RL policy rewarding equal distribution between models)

Also there's this paper from pre-LLM craze that might be helpful. Would be interesting to see it implimented with more powerful language models

https://arxiv.org/pdf/1907.05507.pdf

i've had "best" results mushing everything into a single context window with a single "final"/"next" answer , i think i remember @teknium saying they often do that and they may have published that research , but i cant speak for them, i just remember them saying that and feeling validated :-)