Token size limit

#22
by gebaltso - opened

Hello, I would like to ask which is the size limit of the prompt token in sd3. Is it the 2 x 77 or I misunderstood?Thanks in advance.

For now is 77, this is for the three text encoders. There's a PR for only the T5 to be higher which can be as high as 512 but for the clip ones it will still be 77.

hi @gebaltso do you mean 77 for prompt + 77 for negative prompt?
According to code yes it should be 77, but my side it truncates after 75 I don't know why.

The real tokens are 75, the other two are for bos and eos. Also the 2 x 77 means that each clip model uses 77 tokens and since they're two this means 2 x 77.

Because the example prompts has more than 77 tokens, I previously modified diffusers to support T5 512 long token.
But unfortunately this space is rarely used by anyone 😂mood.
https://huggingface.co/spaces/vilarin/sd3m-long

this almost works:

from compel import Compel, ReturnedEmbeddingsType

compel = Compel(
truncate_long_prompts=False,
tokenizer=[
pipeline.tokenizer,
pipeline.tokenizer_2
],
text_encoder=[
pipeline.text_encoder,
pipeline.text_encoder_2
],
returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
requires_pooled=[
False,
True
]
)

conditioning, pooled = compel(prompt)
negative_embed, negative_pooled = compel(negative_prompt)
[conditioning, negative_embed] = compel.pad_conditioning_tensors_to_same_length(
[conditioning, negative_embed])

pipe = pipeline(output_type='pil', num_inference_steps=num_inference_steps, num_images_per_prompt=num_images_per_prompt, width=512, height=512,
prompt_embeds=conditioning, pooled_prompt_embeds=pooled, negative_prompt_embeds=negative_embed, negative_pooled_prompt_embeds=negative_pooled).images

Sign up or log in to comment