Conversation derails after a certain number of tokens (?)

#20

by mindplay - opened Aug 4, 2023

Aug 4, 2023

•

edited Aug 4, 2023

Just came here to say, wow, this model is extremely good - I was quite surprised at the helpfulness of this rather small model!

However, I've noticed, after a certain number of turns, it seems to cut off abruptly - and if you attempt to continue the conversation, it goes completely off the rails! It completely switched from helpful and objective to being all like "haha! nope!" and using a whole bunch of emoji.

I tried to delete the end of the conversation and resume, but this appears to happen consistently after a certain number of turns/tokens.

I'm otherwise extremely surprised and impressed with it's ability to explain some rather complex and exotic programming topics I was asking about!

Really promising stuff. :-)

djokowsj90

Aug 6, 2023

I noticed the same thing.
It started to talk Spanish after answering my prompt question.

Rajath-jain

Sep 29, 2023

Hi can you please provide a short snippet on how you used starchat as a conversation bot? I deployed starcoder in Sagemaker using the deployment script from HF. Using that sample code it only does autocomplete, how to use it like a chatbot?

mindplay

Sep 29, 2023

There's a chat/demo here:

https://huggingface.co/spaces/HuggingFaceH4/starchat-playground

By the way, I understand now why the conversation derails - from my limited understanding, this happens with all models when you exceed the maximum length of the content it was trained on. Some interfaces (such as ChatGPT) work around this problem internally, either by truncating or summarizing the conversation behind the scenes, when the conversation length starts to approach the limit.

I wonder when we'll start to see implementation of this:

https://github.com/ggerganov/llama.cpp/discussions/2936

It looks relatively simple, and supposedly solves the conversation length issue by enabling the model to selectively forget things that fall out of conversation scope. It also apparently speeds up the model by 2-4x!

David Shapiro talks about it in this video:

https://www.youtube.com/watch?v=5XaJQKgL9Hs&t=8s&pp=ygULbG0taW5maW5pdGU%3D

I'm not sure why we're not seeing implementations of this everywhere yet - it sounds like a slam dunk. 🙂

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment