File size: 4,490 Bytes
45a8e99
 
cf2fde0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff28b38
45a8e99
cf2fde0
 
 
 
 
16ac9dc
cf2fde0
 
16ac9dc
cf2fde0
 
16ac9dc
cf2fde0
58d71ef
cf2fde0
3f1bbce
 
cf2fde0
 
 
 
7699903
cf2fde0
0fe2326
 
57cda8f
16ac9dc
cf2fde0
 
 
 
 
 
 
 
 
13e9a0b
cf2fde0
16ac9dc
0fe2326
b9233ca
0fe2326
16ac9dc
da1c05e
9979670
0fe2326
13e9a0b
16ac9dc
 
 
0fe2326
 
9701c55
 
9073ee3
0fe2326
 
4f2b130
e829021
a35edc7
16ac9dc
a35edc7
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: openrail
metrics:
- accuracy
- bertscore
- bleu
- bleurt
- brier_score
- cer
- character
- charcut_mt
- chrf
- code_eval
tags:
- text-to-image
- sygil-devs
- Muse
- Sygil-Muse
pipeline_tag: text-to-image
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This model is based in [Muse](https://muse-model.github.io/) and trained using the code hosted on [Sygil-Dev/muse-maskgit-pytorch](https://github.com/Sygil-Dev/muse-maskgit-pytorch), which is based on [`lucidrains/muse-maskgit-pytorch`](https://github.com/lucidrains/muse-maskgit-pytorch) and a collaboration between the [Sygil-Dev](https://github.com/Sygil-Dev) and [ShoukanLabs](https://github.com/ShoukanLabs) teams.

# Model Details
This model is a new model trained from scratch based on [Muse](https://muse-model.github.io/), trained on a subset of the [Imaginary Network Expanded Dataset](https://github.com/Sygil-Dev/INE-dataset), with the big advantage of allowing the use of multiple namespaces (labeled tags) to control various parts of the final generation.
The use of namespaces (eg. “species:seal” or “studio:dc”) stops the model from misinterpreting a seal as the singer Seal, or DC Comics as Washington DC. 

Note: As of right now, only the first VAE and MaskGit has been trained with different configuration, we are trying to find the best balance between quality, performance and vram usage so Muse can be used on all kind of devices, we still need to train the Super Resolution VAE for the model to be usable even tho we might be able to reuse the first VAE depending on the quality of it once the training progresses more.

If you find my work useful, please consider supporting me on [GitHub Sponsors](https://github.com/sponsors/ZeroCool940711)! 

This model is still in its infancy and it's meant to be constantly updated and trained with more and more data as time goes by, so feel free to give us feedback on our [Discord Server](https://discord.gg/ttM8Tm6wge) or on the discussions section on huggingface. We plan to improve it with more, better tags in the future, so any help is always welcome.
[![Join the Discord Server](https://badgen.net/discord/members/fTtcufxyHQ?icon=discord)](https://discord.gg/ttM8Tm6wge)


## Available Checkpoints:
  - #### Stable:
    - None
  - #### Beta:
    - [vae.12145000.pt](https://huggingface.co/Sygil/Sygil-Muse/blob/main/vae.12145000.pt): Trained from scratch for 12.14M steps with **dim: 32**,**vq_codebook_dim: 8192** and **vq_codebook_size: 8192**.
    - [maskgit.5125000.pt](https://huggingface.co/Sygil/Sygil-Muse/blob/main/maskgit.5125000.pt): Maskgit trained from the beta VAE for 5.12M steps.

  Note: Checkpoints under the Beta section are updated daily or at least 3-4 times a week. While the beta checkpoints can be used as they are, only the latest version is kept on the repo and the older checkpoints are removed when a new one
  is uploaded to keep the repo clean.
  
## Training

**Training Data**:
The model was trained on the following dataset:
- [Imaginary Network Expanded Dataset](https://github.com/Sygil-Dev/INE-dataset) dataset.

**Hardware and others**
- **Hardware:** 1 x Nvidia RTX 3050 GPU
- **Hours Trained:** NaN.
- **Gradient Accumulations**: 10
- **Batch Size:** 1
- **Learning Rate:** 1e-5
- **Learning Rate Scheduler:** `constant_with_warmup`
- **Scheduler Power:** 1.0
- **Optimizer:** Adam
- **Warmup Steps:** 10,000
- **Number of Cycles:** 200
- **Resolution/Image Size**: First trained at a resolution of 64x64, then increased to 256x256 and then to 512x512. Check the notes down below for more details on this.
- **Dimension:** 32
- **vq_codebook_dim:** 8192
- **vq_codebook_size:** 8192
- **num_tokens:** 8192
- **seq_len:** 1024
- **heads:** 8
- **depth:** 4
- **Random Crop:** True
- **Total MaskGit Training Steps:** 5,125,000
- **Total VAE Training Steps:** 12,145,000

  Note: On Muse we can change the image_size or resolution at any time without having to train the model from scratch again, this allows us to first train the model at low resolution using the same `dim` and `vq_codebook_size` to train faster and then we can increase the `image_size` and use a higher resolution once the model has trained enough.

Developed by: [ZeroCool](https://github.com/ZeroCool940711) at [Sygil-Dev](https://github.com/Sygil-Dev/).

# License
This model is open access and available to all, with a CreativeML Open RAIL++-M License further specifying rights and usage.