File size: 1,998 Bytes
5dbce35 51595e6 83b09ef 5dbce35 83b09ef c654f89 83b09ef 4ca917a 83b09ef e6e61ad 83b09ef |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
license: mit
language:
- en
library_name: diffusers
---
# Arc2Face Model Card
<div align="center">
[**Project Page**](https://arc2face.github.io/) **|** [**Paper (ArXiv)**]() **|** [**Code**](https://github.com/foivospar/Arc2Face)
</div>
## Introduction
Arc2Face is an ID-conditioned face model, that can generate diverse, ID-consistent photos of a person given only its ArcFace ID-embedding.
It is trained on a restored version of the WebFace42M face recognition database, and is further fine-tuned on FFHQ and CelebA-HQ.
<div align="center">
<img src='assets/samples_short.jpg'>
</div>
## Model Details
It consists of 2 components:
- encoder, a finetuned CLIP ViT-L/14 model
- arc2face, a finetuned UNet model
both of which are fine-tuned from [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5).
The encoder is tailored for projecting ID-embeddings to the CLIP latent space.
Arc2Face adapts the pre-trained backbone to the task of ID-to-face generation, conditioned solely on ID vectors.
## Usage
The models can be downloaded directly from this repository or using python:
```python
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/config.json", local_dir="./models")
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/diffusion_pytorch_model.safetensors", local_dir="./models")
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/config.json", local_dir="./models")
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/pytorch_model.bin", local_dir="./models")
```
Please check our [GitHub repository](https://github.com/foivospar/Arc2Face) for complete inference instructions.
## Limitations and Bias
- Only one person per image can be generated.
- Poses are constrained to the frontal hemisphere, similar to FFHQ images.
- The model may reflect the biases of the training data or the ID encoder.
## Citation
**BibTeX:**
```bibtex
``` |