andreyp89's picture
Updated documentation and examples
162b9d7

Virtual Try-On Diffusion API

Summary

Virtual Try-On Diffusion [VTON-D] by Texel.Moda is a custom diffusion-based pipeline for fast and flexible multi-modal virtual try-on. Clothing, avatar and background can be specified by reference images or text prompts allowing for clothing transfer, avatar replacement, fashion image generation and other virtual try-on related tasks. Check out the demo on Hugging Face to try the API in a user-friendly way.

Consuming the API

The API is exposed through the RapidAPI Hub which manages API subscriptions, API keys, payments and other things. Please refer to the RapidAPI Documentation to get started.

Generally, in order to use the API you need to perform the following steps:

  • Create a RapidAPI.com account.
  • Navigate to the API page and subscribe to a suitable pricing plan. We also provide a free BASIC plan with 100 API requests per month.
  • Use the obtained RapidAPI key to authenticate (via the X-RapidAPI-Key header) and use the API from any programming language or tool you like.

Example API call using cURL:

curl --request POST \
--url https://try-on-diffusion.p.rapidapi.com/try-on-file \
--header 'Content-Type: multipart/form-data' \
--header 'x-rapidapi-host: try-on-diffusion.p.rapidapi.com' \
--header 'x-rapidapi-key: <RapidAPI Key>' \
--form clothing_image=1.jpg \
--form avatar_image=2.jpg

For a simple Python client implementation please see the Hugging Face demo application source.

Try-On Endpoints

Try-On API consists of two endpoints that differ only in the method of passing reference images:

  • POST /try-on-file - takes reference images as uploaded files in the request body (using multipart/form-data).

  • POST /try-on-url - takes reference images as image URLs in POST parameters.

All image requirements, behavior and status codes are the same for both endpoints, choose the one that best suits your application architecture.

Try-On Input Parameters

All input parameters for the try-on endpoints are currently optional. Images and prompts serve as additional generation conditions and can even be used in combination. Below is the short parameter summary with links to extended information on certain parameters.

List of input parameters for the POST /try-on-file endpoint:

Parameter Description Required
clothing_image Clothing reference image in JPEG, PNG or WEBP format, maximum file size is 12 MB. No
clothing_prompt Text prompt for clothing, can be used instead of an image. Compel weighting syntax is supported. Example: red sleeveless mini dress No
avatar_image Avatar image in JPEG, PNG or WEBP format, maximum file size is 12 MB. No
avatar_sex Avatar sex, either "male" or "female". Will be detected automatically, if left empty or omitted. Will enforce certain avatar sex if specified. No
avatar_prompt Text prompt for the avatar, can be used instead of an image or with image to modify the avatar. Compel weighting syntax is supported. Example: a gentleman with beard and mustache No
background_image Optional background reference image in JPEG, PNG or WEBP format, maximum file size is 12 MB. Original avatar background is preserved if background is not specified. No
background_prompt Optional background text prompt. Original avatar background is preserved if background is not specified. Example: in an autumn park No
seed Seed for image generation. Default is -1 (random seed). Actual seed will also be output in the "X-Seed" response header. Example: 42 No

List of input parameters for the POST /try-on-url endpoint:

Parameter Description Required
clothing_image_url Clothing reference image URL. Image should be in JPEG, PNG or WEBP format, maximum file size is 12 MB. No
clothing_prompt Text prompt for clothing, can be used instead of an image. Compel weighting syntax is supported. Example: red sleeveless mini dress No
avatar_image_url Avatar image URL. Image should be in JPEG, PNG or WEBP format, maximum file size is 12 MB. No
avatar_sex Avatar sex, either "male" or "female". Will be detected automatically, if left empty or omitted. Will enforce certain avatar sex if specified. No
avatar_prompt Text prompt for the avatar, can be used instead of an image or with image to modify the avatar. Compel weighting syntax is supported. Example: a gentleman with beard and mustache No
background_image_url Optional background reference image URL. Image should be in JPEG, PNG or WEBP format, maximum file size is 12 MB. Original avatar background is preserved if background is not specified. No
background_prompt Optional background text prompt. Original avatar background is preserved if background is not specified. Example: in an autumn park No
seed Seed for image generation. Default is -1 (random seed). Actual seed will also be output in the "X-Seed" response header. Example: 42 No

Clothing image

For best results clothing reference images should meet a number of requirements:

  • File format: JPEG, PNG or WEBP
  • Maximum file size: 12 MB
  • Minimum image size: 256x256
  • Recommended image size: 768x1024 and above
  • For best results clothing should be dressed on a person or on a ghost mannequin. Some flat lay clothing photos might work too, but currently it's not guaranteed.
  • Single person on the image (though multiple persons might also work)
  • Frontal photo, though some degree of rotation is fine
  • Good lighting conditions and high image quality as it directly affects the result
  • Minimal occlusion by hair, hands or accessories

To summarize: the better is the clothing image the better is the final result.

Examples of good clothing images:

Clothing prompt

Instead of a clothing image you can use text prompt to describe the garment. Short and clear prompts work best. Additionally, Compel weighting syntax is supported to increase or decrease weight of certain tokens. Examples:

  • a sheer blue sleeveless mini dress
  • a beige woolen sweater and white pleated skirt
  • a black leather jacket and dark blue slim-fit jeans
  • a floral pattern blouse and leggings
  • a colorful+++ t-shirt and black shorts

Avatar image

Avatar images should also meet a some requirements:

  • File format: JPEG, PNG or WEBP
  • Maximum file size: 12 MB
  • Minimum image size: 256x256
  • Recommended image size: 768x1024 and above
  • Single person on the image (though multiple persons might also work)
  • Frontal photo, though some degree of rotation is fine
  • Good lighting conditions and high image quality

Examples of good avatar images:

Avatar prompt

Instead of an avatar image you can use text prompt to describe the person. Short and clear prompts work best. Additionally, Compel weighting syntax is supported to increase or decrease weight of certain tokens. Examples:

  • a beautiful blond girl with long hair
  • a cute redhead girl with freckles
  • a (plus size)++ female model wearing sunglasses
  • a fit man with dark beard and blue eyes
  • a gentleman with beard and mustache

Background image

Background images are used to extract high-level background features only and serve as a reference (and not exact background). Below are basic image requirements:

  • File format: JPEG, PNG or WEBP
  • Maximum file size: 12 MB
  • Recommended image size: 256x256 and above

Examples of background images:

Background prompt

Instead of a background image you can use text prompt to describe the background. Short and clear prompts work best. Additionally, Compel weighting syntax is supported to increase or decrease weight of certain tokens. Examples:

  • in an autumn park
  • in front of a brick wall
  • on an ocean beach with (palm trees)++
  • in a shopping mall
  • in a modern office

Additional notes

We use the "same-crop" approach for clothing and avatar images: images will be cropped roughly the same way (using pose estimation), so we don't have to add too much new information (e.g. assume lower body clothing). So, if you use only a photo of an upper body clothing the result will also be cropped the same way regardless of the avatar image (and the other way around):

Clothing Image Avatar Image Result Image

Try-On Output

Response codes

HTTP status code is used as a high-level response status. In case of a successful API call HTTP code 200 will be returned and response body will contain a resulting JPEG image with the maximum size of 768x1024 pixels. Response will also have the "X-Seed" header set that should contain the actual seed used for image generation (for reproducibility). Other status codes (not 200) indicate unsuccessful request, see the table below for additional details:

Response Code Content-Type Headers Description Example
200 image/jpeg X-Seed: {seed} Successful API call. Response body contains the resulting image in JPEG format.
400 application/json Bad request: at least one of request parameters is invalid. Response body should contain additional error details in JSON format. { "detail": "Invalid upload file type: application/x-zip-compressed" }
403 application/json Indicates authentication issue (e.g. invalid API key).
422 application/json Request validation error. Response body should contain error details in JSON format. { "detail": [ { "loc": [ "string", 0], "msg": "string", "type": "string" } ] }
429 Too many requests. Might be triggered by the RapidAPI proxy in case of reaching maximum request rate or API call limit.
500 Indicates an internal server error, might not have any details.

NSFW content

We use NSFW content checker to ensure we don't output inappropriate images. If potential NSFW content is detected in the generated image, the API will return HTTP status code 400 with a corresponding error message in JSON response.

Use Cases and Recipes

Our Virtual Try-On API offers a flexible way to specify clothing, avatar and background, which makes it possible to not only perform a classic task of virtual try-on, but also generate entirely new images or alter existing images in some interesting aspects. Feel free to try and explore!

In all the examples below all unmentioned inputs are assumed to be empty.

Image-based virtual try-on

The most common use case is to transfer clothing from one photo (e.g. from a product page) to another photo (e.g. user avatar) while maintaining the avatar and the background.

Clothing Image Avatar Image Result Image

Image-based virtual try-on with background

Additionally, it's possible to replace the avatar background with a reference image or a text prompt.

Clothing Image Avatar Image Background Image Result Image

And with a text prompt for the background:

Clothing Image Avatar Image Background Prompt Result Image
in front of a snowy mountain

Avatar from a text prompt

It's possible to replace the person on the clothing image with an avatar, described in a text prompt. Background will be changed as well and will be a random one if not specified:

Clothing Image Avatar Prompt Background Prompt Result Image
a beautiful blond girl with long hair
a gentleman with a long beard and mustache near a fireplace

You may also experiment with avatar prompts for more interesting results:

Clothing Image Avatar Prompt Background Prompt Result Image
(iron man mask)+++ in the Sahara Desert

Creating diverse product images

If you have a clothing image on a ghost mannequin (flat lay photo might work too), you can generate product images with avatars and backgrounds of your choice:

Clothing Image Avatar Prompt Background Image Result Image
a beautiful blond girl with long hair
a gentleman with beard and mustache

Clothing from a text prompt

Similarly, you can specify clothing with a text prompt while providing an avatar image:

Clothing Prompt Avatar Image Result Image
a sheer blue sleeveless mini dress
a colorful t-shirt and black shorts

Modifying clothing

It's possible to modify clothing to some extent using a clothing image and a clothing prompt simultaneously:

Clothing Image Clothing prompt Avatar Image Result Image
(long sleeves)+++
shorts+++

Modifying avatar's body

If you specify clothing and avatar images to be the same while providing an avatar prompt it's possible to change avatar's body proportions. Note that it may require using additional term weighting to achieve stronger changes.

Clothing Image Avatar Image Avatar Prompt Result Image
a (plus size)+ woman
a (muscular bodybuilder)+++++

Txt2Img

As our diffusion model was fine-tuned to produce people wearing various clothing, it can better follow a clothing prompt and output realistic people and garments:

Clothing Prompt Avatar Prompt Background Prompt Result Image
a paisley pattern purple shirt and beige chinos a fit man with dark beard plain white background
a white polka dot pattern dress a beautiful petite blond woman on a yacht

Other creative possibilities

If you specify the same image for clothing and avatar while providing a background prompt (or background image) you can replace the background in a creative way:

Clothing Image Avatar Image Background Prompt Result Image
on a snowy mountain top

It's also possible to use a combination of clothing image, clothing prompt, avatar image and a background to add some accessories:

Clothing Image Clothing Prompt Avatar Image Background Image Result Image
a (light brown purse)+++

Performance

Typically, one try-on request is processed in 5-10 seconds (depending on type of conditions) excluding network latency. In order to reduce network overhead you might want to compress your images before feeding to the API (e.g. using JPEG). Please note that in case of a high demand processing time might increase due to request being queued, though we constantly monitor our GPU cluster capacity and perform scaling as needed.

Known Issues and Limitations

As any generative model, our models are not perfect (though we constantly work on improvements):

  • Currently, we do not fully support flat lay clothing images. Some might work, but that's not guaranteed.
  • Prompt following might not be perfect, especially in case of long and sophisticated prompts. Prefer simpler and more straightforward prompts whenever possible. Also be pretty verbose (e.g. use the word "plain" if you need something of solid color). Additionally, Compel weighting might be used to increase weight of certain tokens.
  • As usual, generative models struggle with hands, fingers and toes, though we try to mitigate it to a certain extent.
  • Currently, we do not support trying on a single garment, only the full look.
  • Hats and sunglasses are not currently transferred, but we are working on it.
  • Backgrounds might lack some clarity as currently we focus more on clothing.
  • In case of a specified background a hairstyle might slightly change.
  • Body shape of the avatar might change towards smaller sizes.

Changelog

The changelog below contains major API updates focusing on new features and other improvements.

  • 2024-12-15: New API release brings support for clothing on ghost mannequins and (partially) flat lay clothing photos.

  • 2024-11-07: Initial public API release.