---
language: en
library_name: transformers
tags:
  - vision
  - image-segmentation
  - nvidia/mit-b5
  - transformers.js
  - onnx
datasets:
  - celebamaskhq
---

# Face Parsing

![example image and output](demo.png)

[Semantic segmentation](https://huggingface.co/docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co/nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co/docs/transformers/model_doc/segformer).

> ONNX model for web inference contributed by [Xenova](https://huggingface.co/Xenova).

## Usage in Python

Exhaustive list of labels can be extracted from [config.json](https://huggingface.co/jonathandinu/face-parsing/blob/65972ac96180b397f86fda0980bbe68e6ee01b8f/config.json#L30).

| id  | label      | note              |
| :-: | :--------- | :---------------- |
|  0  | background |                   |
|  1  | skin       |                   |
|  2  | nose       |                   |
|  3  | eye_g      | eyeglasses        |
|  4  | l_eye      | left eye          |
|  5  | r_eye      | right eye         |
|  6  | l_brow     | left eyebrow      |
|  7  | r_brow     | right eyebrow     |
|  8  | l_ear      | left ear          |
|  9  | r_ear      | right ear         |
| 10  | mouth      | area between lips |
| 11  | u_lip      | upper lip         |
| 12  | l_lip      | lower lip         |
| 13  | hair       |                   |
| 14  | hat        |                   |
| 15  | ear_r      | earring           |
| 16  | neck_l     | necklace          |
| 17  | neck       |                   |
| 18  | cloth      | clothing          |

```python
import torch
from torch import nn
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation

from PIL import Image
import matplotlib.pyplot as plt
import requests

# convenience expression for automatically determining device
device = (
    "cuda"
    # Device for NVIDIA or AMD GPUs
    if torch.cuda.is_available()
    else "mps"
    # Device for Apple Silicon (Metal Performance Shaders)
    if torch.backends.mps.is_available()
    else "cpu"
)

# load models
image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing")
model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing")
model.to(device)

# expects a PIL.Image or torch.Tensor
url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6"
image = Image.open(requests.get(url, stream=True).raw)

# run inference on image
inputs = image_processor(images=image, return_tensors="pt").to(device)
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, ~height/4, ~width/4)

# resize output to match input image dimensions
upsampled_logits = nn.functional.interpolate(logits,
                size=image.size[::-1], # H x W
                mode='bilinear',
                align_corners=False)

# get label masks
labels = upsampled_logits.argmax(dim=1)[0]

# move to CPU to visualize in matplotlib
labels_viz = labels.cpu().numpy()
plt.imshow(labels_viz)
plt.show()
```

## Usage in the browser (Transformers.js)

```js
import {
  pipeline,
  env,
} from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0";

// important to prevent errors since the model files are likely remote on HF hub
env.allowLocalModels = false;

// instantiate image segmentation pipeline with pretrained face parsing model
model = await pipeline("image-segmentation", "jonathandinu/face-parsing");

// async inference since it could take a few seconds
const output = await model(url);

// each label is a separate mask object
// [
//   { score: null, label: 'background', mask: transformers.js RawImage { ... }}
//   { score: null, label: 'hair', mask: transformers.js RawImage { ... }}
//    ...
// ]
for (const m of output) {
  print(`Found ${m.label}`);
  m.mask.save(`${m.label}.png`);
}
```

### p5.js

Since [p5.js](https://p5js.org/) uses an animation loop abstraction, we need to take care loading the model and making predictions.

```js
// ...

// asynchronously load transformers.js and instantiate model
async function preload() {
  // load transformers.js library with a dynamic import
  const { pipeline, env } = await import(
    "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0"
  );

  // important to prevent errors since the model files are remote on HF hub
  env.allowLocalModels = false;

  // instantiate image segmentation pipeline with pretrained face parsing model
  model = await pipeline("image-segmentation", "jonathandinu/face-parsing");

  print("face-parsing model loaded");
}

// ...
```

[full p5.js example](https://editor.p5js.org/jonathan.ai/sketches/wZn15Dvgh)

### Model Description

- **Developed by:** [Jonathan Dinu](https://twitter.com/jonathandinu)
- **Model type:** Transformer-based semantic segmentation image model
- **License:** non-commercial research and educational purposes
- **Resources for more information:** Transformers docs on [Segformer](https://huggingface.co/docs/transformers/model_doc/segformer) and/or the [original research paper](https://arxiv.org/abs/2105.15203).

## Limitations and Bias

### Bias

While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities.