File size: 2,782 Bytes
ff18677
 
 
 
 
18a1a6c
 
 
 
 
 
ff18677
 
 
 
 
 
 
 
 
 
 
66cfb00
ff18677
ebabd25
ff18677
66cfb00
 
 
 
 
ff18677
 
ebabd25
66cfb00
 
 
 
 
 
 
 
ff18677
2e0a5a7
ff18677
 
 
 
 
 
 
 
 
 
 
 
 
2e0a5a7
ebabd25
2e0a5a7
 
ff18677
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
license: creativeml-openrail-m
library_name: diffusers
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- diffusers-training
- lora
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- diffusers-training
- lora
base_model: runwayml/stable-diffusion-v1-5
inference: true
---


# LoRA text2image fine-tuning - animanatwork/illustrations-lora
These are LoRA adaption weights for runwayml/stable-diffusion-v1-5. The weights were fine-tuned on the animanatwork/text_to_image_dataset dataset. 

Below, we can find some images from the dataset:

<div style="display: flex; justify-content: space-between;">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/66297c313291276a14318d23/fHCi3t9AlK5AasMt_K0nh.png" width="30%" />
  <img src="https://cdn-uploads.huggingface.co/production/uploads/66297c313291276a14318d23/fYdTOG8QKtUHKvDOBw40r.png" width="30%" /> 
  <img src="https://cdn-uploads.huggingface.co/production/uploads/66297c313291276a14318d23/IXx2U6cM0SH4CFGw1qmjE.png" width="30%" />
</div>


The images below are generated from the model using the prompt: "a stylized illustration of a woman sitting in a comfortable chair, reading a book. She is wearing a hat, and her expression appears focused and calm. A black cat is also depicted, sitting beside her and looking at the book, suggesting a shared moment of quiet and companionship. The woman is dressed in a casual outfit with yellow shoes, and the overall color scheme is simple, using black, white, and yellow. The setting seems cozy and peaceful, ideal for reading."

<div style="display: flex; justify-content: space-between;">
  <img src="./image_0.png" width="25%" />
  <img src="./image_1.png" width="25%" />
  <img src="./image_2.png" width="25%" />
  <img src="./image_3.png" width="25%" />
</div>

## Intended uses & limitations
Do NOT use in production. This model was purely created for research purposes. 

#### How to use

```python
# TODO: add an example code snippet for running this diffusion pipeline
```

#### Limitations and bias

[TODO: provide examples of latent issues and potential remediations]

## Training details

- The model was trained on the "animanatwork/text_to_image_dataset" dataset using 10_000 training step (default is 15_000) and took several hours to train. For more details see [Colab notebook](https://colab.research.google.com/drive/1CePJWR2sfYW-w0oPuiIdJzuc82Z6yYHt#scrollTo=QzKEQJYkUv2Q).
- The dataset's tokens were generated using chatGPT vision. During training, I noticed CLIP can only use 77 tokens for a given image. Since most of our image descriptions contained more tokens, we'll have to create a new dataset that doesn't exceed the maximum.  


[TODO: describe the data used to train the model]