Multi-round conversation support/example code

by Xenova HF staff - opened 5 days ago

5 days ago

Hi there! Do you have any example code for multi-round conversations where past key values in the first generation are passed as inputs to the second round? The demo doesn't seem to showcase this, so maybe it's a limitation of the model?

Note: I'm mostly concerned about image+text-to-text, since I believe I read somewhere that you don't support interleaved image-to-text (making multi-round conversations unsupported).

I loop forward to your response!

CharlesCXK

DeepSeek org 5 days ago

Hello, our model currently does not support interleaved multimodal generation (which refers to having both text and images in a single response), but it does support multi-turn dialogue functionality (for example, uploading an image and then asking multiple questions). This is not implemented in the current demo, but if you're interested, you can refer to the multi-turn dialogue part in https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B/blob/main/app.py and modify it accordingly.

Xenova

4 days ago

•

edited 4 days ago

Thanks @CharlesCXK for the response! I am looking for an example with multi-turn dialogue (uploading an image and asking multiple questions) but passing the generated past_key_values into the second turn meaning we don't need to recompute image embeddings on the second turn (similar to this, but with images as input instead of generating images). I can't seem to find that in the link you sent.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment