---
library_name: transformers
license: apache-2.0
datasets:
- isek-ai/danbooru-tags-2023
inference: false
---

# Dart (Danbooru Tags Transformer) v1

This model is a pretrained Dart (**Da**nboo**r**u **T**ags Transformer) model that generates danbooru tags.

Demo: [🤗 Space](https://huggingface.co/spaces/p1atdev/danbooru-tags-transformer)

If you are an end user, it's recommended using the fine-tuned version, [p1atdev/dart-v1-sft](https://huggingface.co/p1atdev/dart-v1-sft), instead 

## Usage

#### Note

Since this model was trained only in alphabetical order, **placing tags that are later in alphabetical order at the beginning can prevent it from generating tags appropriately**. 
Using the [fine-tuned version]((https://huggingface.co/p1atdev/dart-v1-sft)) can eliminate this concern.

### Using AutoModel

🤗 Transformers library is required.

```bash
pip install -U transformers
```

```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

MODEL_NAME = "p1atdev/dart-v1-base"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) # trust_remote_code is required for tokenizer
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16)

prompt = "<|bos|><rating>rating:sfw, rating:general</rating><copyright>original</copyright><character></character><general>1girl"
inputs = tokenizer(prompt, return_tensors="pt").input_ids

with torch.no_grad():
  outputs = model.generate(inputs, generation_config=model.generation_config)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# rating:sfw, rating:general, original, 1girl, ahoge, black hair, blue eyes, blush, closed mouth, ear piercing, earrings, jewelry, looking at viewer, mole, mole under eye, piercing, portrait, shirt, short hair, solo, white shirt
```

You can use `tokenizer.apply_chat_template` to simplify constructiing of prompts:

```py
inputs = tokenizer.apply_chat_template({
  "rating": "rating:sfw, rating:general",
  "copyright": "original",
  "character": "",
  "general": "1girl"
}, return_tensors="pt", tokenize=True) # tokenize=False to preview prompt
# same as input_ids of "<|bos|><rating>rating:sfw, rating:general</rating><copyright>original</copyright><character></character><general>1girl"

with torch.no_grad():
  outputs = model.generate(inputs, generation_config=generation_config)
```

See [chat_templating document](https://huggingface.co/docs/transformers/main/en/chat_templating) for more detail about `apply_chat_template`.

#### Flash attention (optional)

Using flash attention can optimize computations, but it is currently only compatible with Linux.

```bash
pip install flash_attn
```

### Accelerate with ORTModel

🤗 Optimum library is also compatible, for the high performance inference using ONNX.

```bash
pip install "optimum[onnxruntime]"
```

Two ONNX models are provided:

- [Normal](./model.onnx)
- [Quantized](./model_quantized.onnx)

Both can be utilized based on the following code:

```py
import torch
from transformers import AutoTokenizer, GenerationConfig
from optimum.onnxruntime import ORTModelForCausalLM

MODEL_NAME = "p1atdev/dart-v1-base"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

# normal version
ort_model = ORTModelForCausalLM.from_pretrained(MODEL_NAME)

# qunatized version
# ort_model = ORTModelForCausalLM.from_pretrained(MODEL_NAME, file_name="model_quantized.onnx")

inputs = tokenizer.apply_chat_template({
  "rating": "rating:sfw, rating:general",
  "copyright": "original",
  "character": "",
  "general": "1girl"
}, return_tensors="pt", tokenize=True,)

with torch.no_grad():
  outputs = ort_model.generate(inputs, generation_config=model.generation_config)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Prompt guidde

Due to training with a specialized prompt format, **natural language is not supported**.

The trained sentences are essentially composed of the following elements, arranged in the strict order shown below:

- `<|bos|>`: The bos (begin of sentence) token
- `<rating>[RATING_PARENT], [RATING_CHILD]</rating>`: The block of rating tags
  - [RATING_PARENT]: `rating:sfw`, `rating:nsfw`
  - [RATING_CHILD]:
    - if `[RATING_PARENT]` is `rating:sfw`: `rating:general`, `rating:sensitive`
    - else: `rating:questionable`, `rating:explicit`
- `<copyright>[COPYRIGHT, ...]</copyright>`: The block of copyright tags.
  - [COPYRIGHT, ...]: All supported copyright tags can be seen in [here](https://huggingface.co/p1atdev/dart-v1-base/tree/main/tags)
- `<character>[CHARACTER, ...]</character>`: The block of character tags.
  - [CHARACTER, ...]: All supported character tags can be seen in [here](https://huggingface.co/p1atdev/dart-v1-base/tree/main/tags)
- `<general>[GENERAL, ...]</general>`: The block of general tags.
  - [GENERAL, ...]: All supported general tags can be seen in [here](https://huggingface.co/p1atdev/dart-v1-base/tree/main/tags)
- `<|eos|>`: The eos (end of sentence) token

- Tags other than special tokens are separated by commas.
- All tags are arranged in alphabetical order.

Example sentence:

```
<|bos|><rating>rating:sfw, rating:general</rating><copyright>vocaloid</copyright><character>hatsune miku</character><general>1girl, blue hair, cowboy shot, ...</general><|eos|>
```

Therefore, to complete the tags, the input prompt should be as follows:

1. without any copyright and character tags

```
<|bos|><rating>rating:sfw, rating:general</rating><copyright></copyright><character></character><general>1girl
```

2. specifing copyright and character tags

```
<|bos|><rating>rating:sfw, rating:general</rating><copyright>sousou no frieren</copyright><character>frieren</character><general>1girl
```

## Model Details

### Model Description

- **Developed by:** Plat
- **Model type:** Causal language model
- **Language(s) (NLP):** Danbooru tags
- **License:** Apache-2.0

- **Demo:** Avaiable on [🤗Space](https://huggingface.co/spaces/p1atdev/danbooru-tags-transformer)

## Bias, Risks, and Limitations

Since this model is a pre-trained model, it cannot accommodate flexible specifications.

## Training Details

### Training Data

This model was trained with:

- [isek-ai/danbooru-tags-2023](https://huggingface.co/datasets/isek-ai/danbooru-tags-2023): 6M size of danbooru tags dataset since 2005 to 2023


### Training Procedure 

Trained using 🤗 transformers' trainer.

#### Preprocessing

Preprocessing was conducted through the following process:

1. Remove data where `general` tags is null.
2. Remove `general` tags that appear less than 100 times.
3. Remove undesirable tags such as `watermark` and `bad anatomy`.
4. Remove based on the number of tags attached to a single post (following rules):
  - Remove if more than 100 for `general` tags.
  - Remove if more than 5 for `copyright` tags.
  - Remove if more than 10 for `character` tags.
5. Concatenate while splitting with special tokens according to the category of the tags.


#### Training Hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 1


## Evaluation

Evaluation has not been done yet and it needs to evaluate.

## Technical Specifications

### Model Architecture and Objective

The architecture of this model is [OPT (Open Pretrained Transformer)](https://huggingface.co/docs/transformers/model_doc/opt), but the position embeddings was not trained.

### Compute Infrastructure

In house

#### Hardware

1x RTX 3070 Ti

#### Software

- Dataset processing: [🤗 Datasets](https://github.com/huggingface/datasets)
- Training: [🤗 Transformers](https://github.com/huggingface/transformers)
- Optimizing: [🤗 Optimum](https://github.com/huggingface/optimum)

## More Information [optional]

[More Information Needed]