|
--- |
|
license: wtfpl |
|
datasets: |
|
- MedIR/roco |
|
language: |
|
- en |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
# Prompt2MedImage - Diffusion for Medical Images |
|
|
|
Prompt2MedImage is a latent text to image diffusion model that has been fine-tuned on medical images from ROCO dataset. |
|
|
|
The weights here are itended to be used with the 🧨Diffusers library. |
|
|
|
This model was trained using Amazon SageMaker and the Hugging Face Deep Learning container. |
|
|
|
## Model Details |
|
- **Developed by:** Nihir Chadderwala |
|
- **Model type:** Diffusion based text to medical image generation model |
|
- **Language:** English |
|
- **License:** wtfpl |
|
- **Model Description:** This latent text to image diffusion model can be used to generate high quality medical images based on text prompts. It uses a fixed, pretrained text encoder ([CLIP ViT-L/14](https://arxiv.org/abs/2103.00020)) as suggested in the [Imagen paper](https://arxiv.org/abs/2205.11487). |
|
|
|
|
|
## License |
|
|
|
This model is open access and available to all, with a Do What the F*ck You want to public license further specifying rights and usage. |
|
|
|
- You can't use the model to deliberately produce nor share illegal or harmful outputs or content. |
|
- The author claims no rights on the outputs you generate, you are free to use them and are accountable for their use. |
|
- You may re-distribute the weights and use the model commercially and/or as a service. |
|
|
|
|
|
## Run using PyTorch |
|
|
|
```bash |
|
pip install diffusers transformers |
|
``` |
|
|
|
Running pipeline with default PNDM scheduler: |
|
|
|
```python |
|
import torch |
|
from diffusers import StableDiffusionPipeline |
|
|
|
model_id = "Nihirc/Prompt2MedImage" |
|
device = "cuda" |
|
|
|
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16) |
|
pipe = pipe.to(device) |
|
|
|
prompt = "Showing the subtrochanteric fracture in the porotic bone." |
|
image = pipe(prompt).images[0] |
|
|
|
image.save("porotic_bone_fracture.png") |
|
``` |
|
|
|
## Citation |
|
|
|
``` |
|
O. Pelka, S. Koitka, J. Rückert, F. Nensa, C.M. Friedrich, |
|
"Radiology Objects in COntext (ROCO): A Multimodal Image Dataset". |
|
MICCAI Workshop on Large-scale Annotation of Biomedical Data and Expert Label Synthesis (LABELS) 2018, September 16, 2018, Granada, Spain. Lecture Notes on Computer Science (LNCS), vol. 11043, pp. 180-189, Springer Cham, 2018. |
|
doi: 10.1007/978-3-030-01364-6_20 |
|
``` |