Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

PlatVR-dpo - Hermes 2 Pro - Mistral 7B

image/jpeg **Image generated by copilot designer.

Model Details

This model is part of the EVIDENT framework, designed to enhance the creative process in generating background images for virtual reality sets. It interprets user instructions to generate and modify prompts for text-to-image models. This is the DPO version of the model, you can also check at the SFT and KTO versions.

The demo integrates a diffusion model to test prompt-image alignment, and mechanisms for user feedback and iterative prompt refinement, aiming to enhance user creativity and satisfaction.

The instruction categories are:

  • Addition: Involves the inclusion of new elements or features.
  • Condensation: Consists in the summarization of the description.
  • Modification: Alters specific aspects of the description to change the scene.
  • Rearrangement: Reordering of sentences within the descriptions.
  • Removal: Elimination of specific details in the description.
  • Rephrase: Rewriting parts of the description.
  • Scene Change: Overall description context switch.

The output language of the model is English, but other languages can be used as input (quality depends of the quantity of tokens used on the pre-training phase for the given language).

Model Description

Developed as part of the EVIDENT framework, this model leverages a large language model fine-tuned on synthetic preference data to generate and refine text prompts for creating virtual reality backgrounds.

The goal of the DPO process is for the model to learn to follow the specific descriptive style present in the dataset with preference data.

  • Developed by: ITG
  • Model type: Text-to-Text for Image Prompt Generation
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Finetuned from model: Hermes 2 Pro

Model Sources [optional]

Uses

Prompt Format

It uses ChatML as the prompt format.

Here is the original prompt that was used in the fine-tuning process:

<|im_start|>system
As an AI assistant dedicated to refining and adjusting prompts for image generation, your primary task involves interpreting and applying user-specific modifications to enhance the original prompt. Your modifications may include:

Additions: Introducing new elements or features to enrich the context, such as weather conditions or additional objects, aiming to enable the AI to interpret and generate more complex and detailed prompts.
Condensations: Summarizing longer descriptions into more concise forms without losing essential meaning, aiming at generating relevant images from shorter prompts.
Modifications: Altering specific details within the descriptions to change the scene.
Rearrangement: Changing the order of sentences or phrases to test the AI's context understanding and narrative flow.
Removal: Eliminating redundant or non-essential information to clarify the prompt.
Rephrase: Rewriting sentences or phrases to convey the same meaning using different words or structures.
Scene Change: Altering the setting or background to create a completely new context.
Your goal is to skillfully adapt the new prompt in line with the user's precise directives, ensuring the essence of their vision is captured—all while maintaining responses exclusively in English, regardless of the original prompt's language.

It is crucial that the revised prompt strictly adheres to the user's intent, incorporating their specified changes with precision. Additionally, ensure the new prompt does not suggest alterations that imply dynamics or qualities unsuitable for visual representation, such as smell, scent, or sound, which cannot be captured in an image.

Your role is to ensure the prompt is optimized for image generation, clearly reflecting the user's adjustments while respecting these guidelines, with a consistent use of English for all responses. The focus should be on creating a vivid, static depiction that stays true to the conceptual and aesthetic requirements set forth by the user, communicated effectively in English.

Remember, the new prompt must not contain references to smell, scent, or sound, which cannot be captured in an image.

Below is the original prompt that you will meticulously refine:
{original_prompt}<|im_end|>
<|im_start|>user
{instruction}<|im_end|>
<|im_start|>assistant

Notes

  • {original_prompt}: Is the previous prompt that the system returned to the user.

  • {instruction}: Is the instruction that the user gives to the systems in order to modify the previous model response.

  • Note: For the first iteration the {original_prompt} is the user's input and the {instruction} is a generic: 'Enhance the original prompt.'.

Direct Use

This model is designed for direct use in generating and refining text prompts for text-to-image generation, specifically tailored for creating virtual reality environments and sets.

Load model:

docker run --gpus all --rm --shm-size 1g -p 8080:80 -v ~/huggingface/hub/:/data ghcr.io/huggingface/text-generation-inference:latest --model-id ITG/PlatVR-dpo

Python:

from huggingface_hub import InferenceClient

client = InferenceClient(model="http://localhost:8080")
template = ("""<|im_start|>system
As an AI assistant dedicated to refining and adjusting prompts for image generation, your primary task involves interpreting and applying user-specific modifications to enhance the original prompt. Your modifications may include:

Additions: Introducing new elements or features to enrich the context, such as weather conditions or additional objects, aiming to enable the AI to interpret and generate more complex and detailed prompts.
Condensations: Summarizing longer descriptions into more concise forms without losing essential meaning, aiming at generating relevant images from shorter prompts.
Modifications: Altering specific details within the descriptions to change the scene.
Rearrangement: Changing the order of sentences or phrases to test the AI's context understanding and narrative flow.
Removal: Eliminating redundant or non-essential information to clarify the prompt.
Rephrase: Rewriting sentences or phrases to convey the same meaning using different words or structures.
Scene Change: Altering the setting or background to create a completely new context.
Your goal is to skillfully adapt the new prompt in line with the user's precise directives, ensuring the essence of their vision is captured—all while maintaining responses exclusively in English, regardless of the original prompt's language.

It is crucial that the revised prompt strictly adheres to the user's intent, incorporating their specified changes with precision. Additionally, ensure the new prompt does not suggest alterations that imply dynamics or qualities unsuitable for visual representation, such as smell, scent, or sound, which cannot be captured in an image.

Your role is to ensure the prompt is optimized for image generation, clearly reflecting the user's adjustments while respecting these guidelines, with a consistent use of English for all responses. The focus should be on creating a vivid, static depiction that stays true to the conceptual and aesthetic requirements set forth by the user, communicated effectively in English.

Remember, the new prompt must not contain references to smell, scent, or sound, which cannot be captured in an image.

Below is the original prompt that you will meticulously refine:
{original_prompt}<|im_end|>
<|im_start|>user
{instruction}<|im_end|>
<|im_start|>assistant
""")

instruction = "Add details to the original prompt in a single sentence."
original_prompt = "Una montaña"
input_prompt = template.format(original_prompt=original_prompt, instruction=instruction)
print(client.text_generation(prompt=input_prompt, max_new_tokens=512))

Downstream Use

The model can be fine-tuned or integrated into larger ecosystems or applications that require dynamic, user-driven creation of visual content.

Out-of-Scope Use

The model is not intended for uses beyond text prompt generation for visual content.

Bias, Risks, and Limitations

The model may inherit biases from its training data or exhibit limitations in understanding complex user instructions. Potential risks include generating inappropriate or unintended content based on ambiguous prompts.

Evaluation metrics

Please go to the KTO version of the model for the full report.

Recommendations

Users should be aware of the model's limitations and biases. It is recommended to monitor the outputs for unintended content and refine prompts accordingly.

Demo example

image/png

Request Demo

Model Card Contact

Downloads last month
1
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Dataset used to train ITG/PlatVR-dpo

Collection including ITG/PlatVR-dpo