Model Card for Model ID

This is a face recognition model, which extracts a facial feature vector from an aligned facial image.

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

Developed by: Martin Knoche
Funded by [optional]: Technical University of Munich
Shared by [optional]: Martin Knoche
Model type: Convolutional Neural Network
License: Original Work:

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Changes in Code, Finetuning etc. are also under MIT License:

MIT License

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

Finetuned from model: FaceTransformer by zhongyy

Model Sources

Repository: GitHub
Paper: IEEExplore

Uses

Use the model to extract a facial feature vector from an arbitrary aligned facial image. You can then compare that vector to other facial feature vectors to decide for same or not same person.

Direct Use

The model can be used by within an ONNX-Runtime environment.

model = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())
embedding = model.run(None, {"input_image": input_image})[0][0]

input_image-Variable

Dimensions: 112x112x3
Channels: Should be in RGB format
Type: float
Values: Between 0 and 255

embedding-Variable

Dimension: 512
Type: float

Bias, Risks, and Limitations

The model was originally trained and also finetuned on the MS1M dataset. Thus please be check the MS1M dataset for bias and risks.

How to Get Started with the Model

Use the code below to get started with the model:

import numpy as np
import onnxruntime as rt
import mediapipe as mp
import cv2
import os
import time
from skimage.transform import SimilarityTransform


# ---------------------------------------------------------------------------------------------------------------------
# INITIALIZATIONS

# Target landmark coordinates for alignment (used in training)
LANDMARKS_TARGET = np.array(
    [
        [38.2946, 51.6963],
        [73.5318, 51.5014],
        [56.0252, 71.7366],
        [41.5493, 92.3655],
        [70.7299, 92.2041],
    ],
    dtype=np.float32,
)

# Initialize Face Detector (For Example Mediapipe)
FACE_DETECTOR = mp.solutions.face_mesh.FaceMesh(
    refine_landmarks=True, min_detection_confidence=0.5, min_tracking_confidence=0.5, max_num_faces=1
)

# Initialize the Face Recognition Model (FaceTransformerOctupletLoss)
FACE_RECOGNIZER = rt.InferenceSession("FaceTransformerOctupletLoss.onnx", providers=rt.get_available_providers())


# ---------------------------------------------------------------------------------------------------------------------
# FACE CAPTURE

# Capture a frame with your Webcam and store it on disk
if not os.path.exists("img.jpg"):
    cap = cv2.VideoCapture(1) # open webcam
    time.sleep(2) # wait for camera to warm up
    
    if not cap.isOpened():
        raise IOError("Cannot open webcam")
    
    ret, img = cap.read() # capture a frame
    if ret:
        cv2.imwrite("img.jpg", img) # save the frame
else:
    img = cv2.imread("img.jpg") # read the frame from disk


# ---------------------------------------------------------------------------------------------------------------------
# FACE DETECTION

# Process the image with the face detector
result = FACE_DETECTOR.process(img)

if result.multi_face_landmarks:
    # Select 5 Landmarks (Eye Centers, Nose Tip, Left Mouth Corner, Right Mouth Corner)
    five_landmarks = np.asarray(result.multi_face_landmarks[0].landmark)[[470, 475, 1, 57, 287]]

    # Extract the x and y coordinates of the landmarks of interest
    landmarks = np.asarray(
        [[landmark.x * img.shape[1], landmark.y * img.shape[0]] for landmark in five_landmarks]
    )

    # Extract the x and y coordinates of all landmarks
    all_x_coords = [landmark.x * img.shape[1] for landmark in result.multi_face_landmarks[0].landmark]
    all_y_coords = [landmark.y * img.shape[0] for landmark in result.multi_face_landmarks[0].landmark]

    # Compute the bounding box of the face
    x_min, x_max = int(min(all_x_coords)), int(max(all_x_coords))
    y_min, y_max = int(min(all_y_coords)), int(max(all_y_coords))
    bbox = [[x_min, y_min], [x_max, y_max]]

else:
    print("No faces detected")
    exit()


# ---------------------------------------------------------------------------------------------------------------------
# FACE ALIGNMENT

# Align Image with the 5 Landmarks
tform = SimilarityTransform()
tform.estimate(landmarks, LANDMARKS_TARGET)
tmatrix = tform.params[0:2, :]
img_aligned = cv2.warpAffine(img, tmatrix, (112, 112), borderValue=0.0)

# safe to disk
cv2.imwrite("img2_aligned.jpg", img_aligned)


# ---------------------------------------------------------------------------------------------------------------------
# FACE RECOGNITION

# Inference face embeddings with onnxruntime
input_image = (np.asarray([img_aligned]).astype(np.float32)).clip(0.0, 255.0).transpose(0, 3, 1, 2)
embedding = FACE_RECOGNIZER.run(None, {"input_image": input_image})[0][0]

print("Embedding:", embedding)

# If you have embeddings for several facial images - you can then compute the cosine distance between them and distinguish
# between different or same people based on a threshold. For example, if the cosine distance is less than 0.5, then the
# two images are of the same person, otherwise they are of different people. The lower the cosine distance, the more similar
# the two images are. The cosine distance is a value between 0 and 2, where 0 means the two images are identical and 2 means 
# the two images are completely different. 

# ---------------------------------------------------------------------------------------------------------------------
# VISUALIZATION

# Draw Boundingbox on a copy of image
img_draw = img.copy()
cv2.rectangle(img_draw, (bbox[0][0], bbox[0][1]), (bbox[1][0], bbox[1][1]), (255, 0, 0), 2)

# Show the detected face on the image
cv2.imshow("img", img_draw)
cv2.waitKey(0)

# Show the aligned image
cv2.imshow("img", img_aligned)
cv2.waitKey(0)

See also main.py to start off with the model.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Metrics

Accuracy [%]

Results

LFW	CALFW	CPLFW	MLFW	XQLFW
99.73	94.93	91.58	85.63	95.12

Citation

BibTeX:

@inproceedings{knoche2023octuplet,
  title={Octuplet loss: Make face recognition robust to image resolution},
  author={Knoche, Martin and Elkadeem, Mohamed and H{\"o}rmann, Stefan and Rigoll, Gerhard},
  booktitle={2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)},
  pages={1--8},
  year={2023},
  organization={IEEE}
}

Model Card Author

Martin Knoche

Model Card Contact

Martin.Knoche@tum.de