Textual inversion: Are the imagenet templates fixed?

by xalex - opened

The imagenet templates for objects all talk about a photo, which may be not ideal to train on drawn objects and other things that are not on photos.

Are the templates fixed (e.g. CLIP expects exactly these strings) or can one just change them or add a few like "a picture of {}", "a drawing of {}" and so on?

I'm also wondering this.

Sign up or log in to comment