Spaces:

Enderfga
/

mtCNN_sysu

Runtime error

App Files Files Community

Enderfga commited on Dec 9, 2022

Commit

7652882

•

1 Parent(s): ff76503

Add application file

Browse files

Files changed (23) hide show

README.md +125 -13
app.py +39 -0
get_data.py +852 -0
img/mid.png +0 -0
img/onet.png +0 -0
img/pnet.png +0 -0
img/result.png +0 -0
img/rnet.png +0 -0
model_store/onet_epoch_20.pt +3 -0
model_store/pnet_epoch_20.pt +3 -0
model_store/rnet_epoch_20.pt +3 -0
requirements.txt +10 -0
test.py +84 -0
test.sh +4 -0
train.out +0 -0
train.py +351 -0
train.sh +7 -0
utils/config.py +42 -0
utils/dataloader.py +347 -0
utils/detect.py +758 -0
utils/models.py +207 -0
utils/tool.py +117 -0
utils/vision.py +58 -0

README.md CHANGED Viewed

@@ -1,13 +1,125 @@
----
-title: MtCNN Sysu
-emoji: 📈
-colorFrom: gray
-colorTo: pink
-sdk: gradio
-sdk_version: 3.12.0
-app_file: app.py
-pinned: false
-license: openrail
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks
+This repo contains the code, data and trained models for the paper [Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks](https://arxiv.org/ftp/arxiv/papers/1604/1604.02878.pdf).
+## Overview
+MTCNN is a popular algorithm for face detection that uses multiple neural networks to detect faces in images. It is capable of detecting faces under various lighting and pose conditions and can detect multiple faces in an image.
+We have implemented MTCNN using the pytorch framework. Pytorch is a popular deep learning framework that provides tools for building and training neural networks.
+![](https://img.enderfga.cn/img/image-20221208152130975.png)
+![](https://img.enderfga.cn/img/image-20221208152231511.png)
+## Description of file
+```shell
+├── README.md                      # explanatory document
+├── get_data.py                    # Generate corresponding training data depending on the input “--net”
+├── img                            # mid.png is used for testing visualization effects,other images are the corresponding results.
+│   ├── mid.png
+│   ├── onet.png
+│   ├── pnet.png
+│   ├── rnet.png
+│   ├── result.png
+│   └── result.jpg
+├── model_store                    # Our pre-trained model
+│   ├── onet_epoch_20.pt
+│   ├── pnet_epoch_20.pt
+│   └── rnet_epoch_20.pt
+├── requirements.txt               # Environmental version requirements
+├── test.py                        # Specify different "--net" to get the corresponding visualization results
+├── test.sh                        # Used to test mid.png, which will test the output visualization of three networks
+├── train.out                      # Our complete training log for this experiment
+├── train.py                       # Specify different "--net" for the training of the corresponding network
+├── train.sh                       # Generate data from start to finish and train
+└── utils                          # Some common tool functions and modules
+    ├── config.py
+    ├── dataloader.py
+    ├── detect.py
+    ├── models.py
+    ├── tool.py
+    └── vision.py
+```
+## Requirements
+* numpy==1.21.4
+* matplotlib==3.5.0
+* opencv-python==4.4.0.42
+* torch==1.13.0+cu116
+## How to Install
+- ```shell
+  conda create -n env python=3.8 -y
+  conda activate env
+  ```
+- ```shell
+  pip install -r requirements.txt
+  ```
+## Preprocessing
+- download [WIDER_FACE](http://shuoyang1213.me/WIDERFACE/) face detection data then store it into ./data_set/face_detection
+- download [CNN_FacePoint](http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm) face detection and landmark data then store it into ./data_set/face_landmark
+### Preprocessed Data
+```shell
+# Before training Pnet
+python get_data.py --net=pnet
+# Before training Rnet, please use your trained model path
+python get_data.py --net=rnet --pnet_path=./model_store/pnet_epoch_20.pt
+# Before training Onet, please use your trained model path
+python get_data.py --net=onet --pnet_path=./model_store/pnet_epoch_20.pt --rnet_path=./model_store/rnet_epoch_20.pt
+```
+## How to Run
+### Train
+```shell
+python train.py --net=pnet/rnet/onet #Specify the corresponding network to start training
+bash train.sh                        #Alternatively, use the sh file to train in order
+```
+The checkpoints will be saved in a subfolder of `./model_store/*`.
+#### Finetuning from an existing checkpoint
+```shell
+python train.py --net=pnet/rnet/onet --load=[model path]
+```
+model path should be a subdirectory in the `./model_store/` directory, e.g. `--load=./model_store/pnet_epoch_20.pt`
+### Evaluate
+#### Use the sh file to test in order
+```shell
+bash test.sh
+```
+#### To detect a single image
+```shell
+python test.py --net=pnet/rnet/onet  --path=test.jpg
+```
+#### To detect a video stream from a camera
+```shell
+python test.py --input_mode=0
+```
+#### The result of  "--net=pnet"
+![](https://img.enderfga.cn/img/20221208160900.png)
+#### The result of  "--net=rnet"
+![](https://img.enderfga.cn/img/image-20221208155022083.png)
+#### The result of  "--net=onet"
+![](https://img.enderfga.cn/img/image-20221208155044451.png)

app.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import gradio as gr
+import cv2
+from utils.detect import create_mtcnn_net, MtcnnDetector
+from utils.vision import vis_face
+import argparse
+MIN_FACE_SIZE = 3
+def parse_args():
+    parser = argparse.ArgumentParser(description='Test MTCNN',
+                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument('--net', default='onet', help='which net to show', type=str)
+    parser.add_argument('--pnet_path', default="./model_store/pnet_epoch_20.pt",help='path to pnet model', type=str)
+    parser.add_argument('--rnet_path', default="./model_store/rnet_epoch_20.pt",help='path to rnet model', type=str)
+    parser.add_argument('--onet_path', default="./model_store/onet_epoch_20.pt",help='path to onet model', type=str)
+    parser.add_argument('--path', default="./img/mid.png",help='path to image', type=str)
+    parser.add_argument('--min_face_size', default=MIN_FACE_SIZE,help='min face size', type=int)
+    parser.add_argument('--use_cuda', default=False,help='use cuda', type=bool)
+    parser.add_argument('--thresh', default='[0.1, 0.1, 0.1]',help='thresh', type=str)
+    parser.add_argument('--save_name', default="result.jpg",help='save name', type=str)
+    parser.add_argument('--input_mode', default=1,help='image or video', type=int)
+    args = parser.parse_args()
+    return args
+def greet(name):
+    args = parse_args()
+    thresh = [float(i) for i in (args.thresh).split('[')[1].split(']')[0].split(',')]
+    pnet, rnet, onet = create_mtcnn_net(p_model_path=args.pnet_path, r_model_path=args.rnet_path,o_model_path=args.onet_path, use_cuda=args.use_cuda)
+    mtcnn_detector = MtcnnDetector(pnet=pnet, rnet=rnet, onet=onet, min_face_size=args.min_face_size,threshold=thresh)
+    img = cv2.imread(name)
+    img_bg = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+    p_bboxs, r_bboxs, bboxs, landmarks = mtcnn_detector.detect_face(img)
+    save_name = args.save_name
+    return vis_face(img_bg, bboxs, landmarks, MIN_FACE_SIZE, save_name)
+iface = gr.Interface(fn=greet,
+                    inputs=gr.Image(type="filepath"),
+                    outputs="image")
+iface.launch()

get_data.py ADDED Viewed

	@@ -0,0 +1,852 @@

+import sys
+import numpy as np
+import cv2
+import os
+from utils.tool import IoU,convert_to_square
+import numpy.random as npr
+import argparse
+from utils.detect import MtcnnDetector, create_mtcnn_net
+from utils.dataloader import ImageDB,TestImageLoader
+import time
+from six.moves import cPickle
+import utils.config as config
+import utils.vision as vision
+sys.path.append(os.getcwd())
+txt_from_path = './data_set/wider_face_train_bbx_gt.txt'
+anno_file = os.path.join(config.ANNO_STORE_DIR, 'anno_train.txt')
+# anno_file = './anno_store/anno_train.txt'
+prefix = ''
+use_cuda = True
+im_dir = "./data_set/face_detection/WIDER_train/images/"
+traindata_store = './data_set/train/'
+prefix_path = "./data_set/face_detection/WIDER_train/images/"
+annotation_file = './anno_store/anno_train.txt'
+prefix_path_lm = ''
+annotation_file_lm = "./data_set/face_landmark/CNN_FacePoint/train/trainImageList.txt"
+# ----------------------------------------------------other----------------------------------------------
+pos_save_dir = "./data_set/train/12/positive"
+part_save_dir = "./data_set/train/12/part"
+neg_save_dir = './data_set/train/12/negative'
+pnet_postive_file =  os.path.join(config.ANNO_STORE_DIR, 'pos_12.txt')
+pnet_part_file = os.path.join(config.ANNO_STORE_DIR, 'part_12.txt')
+pnet_neg_file = os.path.join(config.ANNO_STORE_DIR, 'neg_12.txt')
+imglist_filename_pnet = os.path.join(config.ANNO_STORE_DIR, 'imglist_anno_12.txt')
+# ----------------------------------------------------PNet----------------------------------------------
+rnet_postive_file =  os.path.join(config.ANNO_STORE_DIR, 'pos_24.txt')
+rnet_part_file = os.path.join(config.ANNO_STORE_DIR, 'part_24.txt')
+rnet_neg_file = os.path.join(config.ANNO_STORE_DIR, 'neg_24.txt')
+rnet_landmark_file = os.path.join(config.ANNO_STORE_DIR, 'landmark_24.txt')
+imglist_filename_rnet = os.path.join(config.ANNO_STORE_DIR, 'imglist_anno_24.txt')
+# ----------------------------------------------------RNet----------------------------------------------
+onet_postive_file =  os.path.join(config.ANNO_STORE_DIR, 'pos_48.txt')
+onet_part_file = os.path.join(config.ANNO_STORE_DIR, 'part_48.txt')
+onet_neg_file = os.path.join(config.ANNO_STORE_DIR, 'neg_48.txt')
+onet_landmark_file = os.path.join(config.ANNO_STORE_DIR, 'landmark_48.txt')
+imglist_filename_onet = os.path.join(config.ANNO_STORE_DIR, 'imglist_anno_48.txt')
+# ----------------------------------------------------ONet----------------------------------------------
+def assemble_data(output_file, anno_file_list=[]):
+    #assemble the pos, neg, part annotations to one file
+    size = 12
+    if len(anno_file_list)==0:
+        return 0
+    if os.path.exists(output_file):
+        os.remove(output_file)
+    for anno_file in anno_file_list:
+        with open(anno_file, 'r') as f:
+            print(anno_file)
+            anno_lines = f.readlines()
+        base_num = 250000
+        if len(anno_lines) > base_num * 3:
+            idx_keep = npr.choice(len(anno_lines), size=base_num * 3, replace=True)
+        elif len(anno_lines) > 100000:
+            idx_keep = npr.choice(len(anno_lines), size=len(anno_lines), replace=True)
+        else:
+            idx_keep = np.arange(len(anno_lines))
+            np.random.shuffle(idx_keep)
+        chose_count = 0
+        with open(output_file, 'a+') as f:
+            for idx in idx_keep:
+                # write lables of pos, neg, part images
+                f.write(anno_lines[idx])
+                chose_count+=1
+    return chose_count
+def wider_face(txt_from_path, txt_to_path):
+    line_from_count = 0
+    with open(txt_from_path, 'r') as f:
+        annotations = f.readlines()
+    with open(txt_to_path, 'w+') as f:
+        while line_from_count < len(annotations):
+            if annotations[line_from_count][2]=='-':
+                img_name = annotations[line_from_count][:-1]
+                line_from_count += 1                                                    # change line to read the number
+                bbox_count = int(annotations[line_from_count])                          # num of bboxes
+                line_from_count += 1                                                    # change line to read the posession
+                for _ in range(bbox_count):
+                    bbox = list(map(int,annotations[line_from_count].split()[:4]))      # give a loop to append all the boxes
+                    bbox = [bbox[0], bbox[1], bbox[0]+bbox[2], bbox[1]+bbox[3]]         # make x1, y1, w, h  -->  x1, y1, x2, y2
+                    bbox = list(map(str,bbox))
+                    img_name += (' '+' '.join(bbox))
+                    line_from_count+=1
+                f.write(img_name +'\n')
+            else:                                                                       # dectect the file name
+                line_from_count+=1
+# ----------------------------------------------------origin----------------------------------------------
+def get_Pnet_data():
+    if not os.path.exists(pos_save_dir):
+        os.makedirs(pos_save_dir)
+    if not os.path.exists(part_save_dir):
+        os.makedirs(part_save_dir)
+    if not os.path.exists(neg_save_dir):
+        os.makedirs(neg_save_dir)
+    f1 = open(os.path.join('./anno_store', 'pos_12.txt'), 'w')
+    f2 = open(os.path.join('./anno_store', 'neg_12.txt'), 'w')
+    f3 = open(os.path.join('./anno_store', 'part_12.txt'), 'w')
+    with open(anno_file, 'r') as f:
+        annotations = f.readlines()
+    num = len(annotations)
+    print("%d pics in total" % num)
+    p_idx = 0 # positive
+    n_idx = 0 # negative
+    d_idx = 0 # dont care
+    idx = 0
+    box_idx = 0
+    for annotation in annotations:
+        annotation = annotation.strip().split(' ')
+        # annotation[0]文件名
+        im_path = os.path.join(im_dir, annotation[0])
+        # print(im_path)
+        # print(os.path.exists(im_path))
+        bbox = list(map(float, annotation[1:]))
+        # annotation[1:]人脸坐标，一张脸4个值，对应两个点的坐标
+        boxes = np.array(bbox, dtype=np.int32).reshape(-1, 4)
+        # -1处的值为人脸数目
+        if boxes.shape[0]==0:
+            continue
+        # 若无人脸则跳过本次循环
+        img = cv2.imread(im_path)
+        # print(img.shape)
+        # exit()
+        # 计数
+        idx += 1
+        if idx % 100 == 0:
+            print("%s images done, pos: %s part: %s neg: %s" % (idx, p_idx, d_idx, n_idx))
+        # 图片三通道
+        height, width, channel = img.shape
+        neg_num = 0
+        # 取50次不同的框
+        while neg_num < 50:
+            size = np.random.randint(12, min(width, height) / 2)
+            nx = np.random.randint(0, width - size)
+            ny = np.random.randint(0, height - size)
+            crop_box = np.array([nx, ny, nx + size, ny + size])
+            Iou = IoU(crop_box, boxes) # IoU为 重合部分 / 两框之和 ，越大越好
+            cropped_im = img[ny: ny + size, nx: nx + size, :]  # 裁去多余部分并resize成 12*12
+            resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
+            if np.max(Iou) < 0.3:
+                # Iou with all gts must below 0.3
+                save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx)
+                f2.write(save_file + ' 0\n')
+                cv2.imwrite(save_file, resized_im)
+                n_idx += 1
+                neg_num += 1
+        for box in boxes:
+            # box (x_left, y_top, x_right, y_bottom)
+            x1, y1, x2, y2 = box
+            # w = x2 - x1 + 1
+            # h = y2 - y1 + 1
+            w = x2 - x1 + 1
+            h = y2 - y1 + 1
+            # ignore small faces
+            # in case the ground truth boxes of small faces are not accurate
+            if max(w, h) < 40 or x1 < 0 or y1 < 0:
+                continue
+            if w < 12 or h < 12:
+                continue
+            # generate negative examples that have overlap with gt
+            for i in range(5):
+                size = np.random.randint(12, min(width, height) / 2)
+                # delta_x and delta_y are offsets of (x1, y1)
+                delta_x = np.random.randint(max(-size, -x1), w)
+                delta_y = np.random.randint(max(-size, -y1), h)
+                nx1 = max(0, x1 + delta_x)
+                ny1 = max(0, y1 + delta_y)
+                if nx1 + size > width or ny1 + size > height:
+                    continue
+                crop_box = np.array([nx1, ny1, nx1 + size, ny1 + size])
+                Iou = IoU(crop_box, boxes)
+                cropped_im = img[ny1: ny1 + size, nx1: nx1 + size, :]
+                resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
+                if np.max(Iou) < 0.3:
+                    # Iou with all gts must below 0.3
+                    save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx)
+                    f2.write(save_file + ' 0\n')
+                    cv2.imwrite(save_file, resized_im)
+                    n_idx += 1
+            # generate positive examples and part faces
+            for i in range(20):
+                size = np.random.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))
+                # delta here is the offset of box center
+                delta_x = np.random.randint(-w * 0.2, w * 0.2)
+                delta_y = np.random.randint(-h * 0.2, h * 0.2)
+                nx1 = max(x1 + w / 2 + delta_x - size / 2, 0)
+                ny1 = max(y1 + h / 2 + delta_y - size / 2, 0)
+                nx2 = nx1 + size
+                ny2 = ny1 + size
+                if nx2 > width or ny2 > height:
+                    continue
+                crop_box = np.array([nx1, ny1, nx2, ny2])
+                offset_x1 = (x1 - nx1) / float(size)
+                offset_y1 = (y1 - ny1) / float(size)
+                offset_x2 = (x2 - nx2) / float(size)
+                offset_y2 = (y2 - ny2) / float(size)
+                cropped_im = img[int(ny1): int(ny2), int(nx1): int(nx2), :]
+                resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
+                box_ = box.reshape(1, -1)
+                if IoU(crop_box, box_) >= 0.65:
+                    save_file = os.path.join(pos_save_dir, "%s.jpg" % p_idx)
+                    f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2))
+                    cv2.imwrite(save_file, resized_im)
+                    p_idx += 1
+                elif IoU(crop_box, box_) >= 0.4:
+                    save_file = os.path.join(part_save_dir, "%s.jpg" % d_idx)
+                    f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2))
+                    cv2.imwrite(save_file, resized_im)
+                    d_idx += 1
+            box_idx += 1
+            #print("%s images done, pos: %s part: %s neg: %s" % (idx, p_idx, d_idx, n_idx))
+    f1.close()
+    f2.close()
+    f3.close()
+def assembel_Pnet_data():
+    anno_list = []
+    anno_list.append(pnet_postive_file)
+    anno_list.append(pnet_part_file)
+    anno_list.append(pnet_neg_file)
+    # anno_list.append(pnet_landmark_file)
+    chose_count = assemble_data(imglist_filename_pnet ,anno_list)
+    print("PNet train annotation result file path:%s" % imglist_filename_pnet)
+# -----------------------------------------------------------------------------------------------------------------------------------------------#
+def gen_rnet_data(data_dir, anno_file, pnet_model_file, prefix_path='', use_cuda=True, vis=False):
+    """
+    :param data_dir: train data
+    :param anno_file:
+    :param pnet_model_file:
+    :param prefix_path:
+    :param use_cuda:
+    :param vis:
+    :return:
+    """
+    # load trained pnet model
+    pnet, _, _ = create_mtcnn_net(p_model_path = pnet_model_file, use_cuda = use_cuda)
+    mtcnn_detector = MtcnnDetector(pnet = pnet, min_face_size = 12)
+    # load original_anno_file, length = 12880
+    imagedb = ImageDB(anno_file, mode = "test", prefix_path = prefix_path)
+    imdb = imagedb.load_imdb()
+    image_reader = TestImageLoader(imdb, 1, False)
+    all_boxes = list()
+    batch_idx = 0
+    print('size:%d' %image_reader.size)
+    for databatch in image_reader:
+        if batch_idx % 100 == 0:
+            print ("%d images done" % batch_idx)
+        im = databatch
+        t = time.time()
+        # obtain boxes and aligned boxes
+        boxes, boxes_align = mtcnn_detector.detect_pnet(im=im)
+        if boxes_align is None:
+            all_boxes.append(np.array([]))
+            batch_idx += 1
+            continue
+        if vis:
+            rgb_im = cv2.cvtColor(np.asarray(im), cv2.COLOR_BGR2RGB)
+            vision.vis_two(rgb_im, boxes, boxes_align)
+        t1 = time.time() - t
+        print('cost time ',t1)
+        t = time.time()
+        all_boxes.append(boxes_align)
+        batch_idx += 1
+        # if batch_idx == 100:
+            # break
+        # print("shape of all boxes {0}".format(all_boxes))
+        # time.sleep(5)
+    # save_path = model_store_path()
+    # './model_store'
+    save_path = './model_store'
+    if not os.path.exists(save_path):
+        os.mkdir(save_path)
+    save_file = os.path.join(save_path, "detections_%d.pkl" % int(time.time()))
+    with open(save_file, 'wb') as f:
+        cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL)
+    # save_file = './model_store/detections_1588751332.pkl'
+    gen_rnet_sample_data(data_dir, anno_file, save_file, prefix_path)
+def gen_rnet_sample_data(data_dir, anno_file, det_boxs_file, prefix_path):
+    """
+    :param data_dir:
+    :param anno_file: original annotations file of wider face data
+    :param det_boxs_file: detection boxes file
+    :param prefix_path:
+    :return:
+    """
+    neg_save_dir = os.path.join(data_dir, "24/negative")
+    pos_save_dir = os.path.join(data_dir, "24/positive")
+    part_save_dir = os.path.join(data_dir, "24/part")
+    for dir_path in [neg_save_dir, pos_save_dir, part_save_dir]:
+        # print(dir_path)
+        if not os.path.exists(dir_path):
+            os.makedirs(dir_path)
+    # load ground truth from annotation file
+    # format of each line: image/path [x1,y1,x2,y2] for each gt_box in this image
+    with open(anno_file, 'r') as f:
+        annotations = f.readlines()
+    image_size = 24
+    net = "rnet"
+    im_idx_list = list()
+    gt_boxes_list = list()
+    num_of_images = len(annotations)
+    print ("processing %d images in total" % num_of_images)
+    for annotation in annotations:
+        annotation = annotation.strip().split(' ')
+        im_idx = os.path.join(prefix_path, annotation[0])
+        # im_idx = annotation[0]
+        boxes = list(map(float, annotation[1:]))
+        boxes = np.array(boxes, dtype=np.float32).reshape(-1, 4)
+        im_idx_list.append(im_idx)
+        gt_boxes_list.append(boxes)
+    # './anno_store'
+    save_path = './anno_store'
+    if not os.path.exists(save_path):
+        os.makedirs(save_path)
+    f1 = open(os.path.join(save_path, 'pos_%d.txt' % image_size), 'w')
+    f2 = open(os.path.join(save_path, 'neg_%d.txt' % image_size), 'w')
+    f3 = open(os.path.join(save_path, 'part_%d.txt' % image_size), 'w')
+    # print(det_boxs_file)
+    det_handle = open(det_boxs_file, 'rb')
+    det_boxes = cPickle.load(det_handle)
+    # an image contain many boxes stored in an array
+    print(len(det_boxes), num_of_images)
+    # assert len(det_boxes) == num_of_images, "incorrect detections or ground truths"
+    # index of neg, pos and part face, used as their image names
+    n_idx = 0
+    p_idx = 0
+    d_idx = 0
+    image_done = 0
+    for im_idx, dets, gts in zip(im_idx_list, det_boxes, gt_boxes_list):
+        # if (im_idx+1) == 100:
+            # break
+        gts = np.array(gts, dtype=np.float32).reshape(-1, 4)
+        if gts.shape[0]==0:
+            continue
+        if image_done % 100 == 0:
+            print("%d images done" % image_done)
+        image_done += 1
+        if dets.shape[0] == 0:
+            continue
+        img = cv2.imread(im_idx)
+        # change to square
+        dets = convert_to_square(dets)
+        dets[:, 0:4] = np.round(dets[:, 0:4])
+        neg_num = 0
+        for box in dets:
+            x_left, y_top, x_right, y_bottom, _ = box.astype(int)
+            width = x_right - x_left + 1
+            height = y_bottom - y_top + 1
+            # ignore box that is too small or beyond image border
+            if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1:
+                continue
+            # compute intersection over union(IoU) between current box and all gt boxes
+            Iou = IoU(box, gts)
+            cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :]
+            resized_im = cv2.resize(cropped_im, (image_size, image_size),
+                                    interpolation=cv2.INTER_LINEAR)
+            # save negative images and write label
+            # Iou with all gts must below 0.3
+            if np.max(Iou) < 0.3 and neg_num < 60:
+                # save the examples
+                save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx)
+                # print(save_file)
+                f2.write(save_file + ' 0\n')
+                cv2.imwrite(save_file, resized_im)
+                n_idx += 1
+                neg_num += 1
+            else:
+                # find gt_box with the highest iou
+                idx = np.argmax(Iou)
+                assigned_gt = gts[idx]
+                x1, y1, x2, y2 = assigned_gt
+                # compute bbox reg label
+                offset_x1 = (x1 - x_left) / float(width)
+                offset_y1 = (y1 - y_top) / float(height)
+                offset_x2 = (x2 - x_right) / float(width)
+                offset_y2 = (y2 - y_bottom) / float(height)
+                # save positive and part-face images and write labels
+                if np.max(Iou) >= 0.65:
+                    save_file = os.path.join(pos_save_dir, "%s.jpg" % p_idx)
+                    f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % (
+                        offset_x1, offset_y1, offset_x2, offset_y2))
+                    cv2.imwrite(save_file, resized_im)
+                    p_idx += 1
+                elif np.max(Iou) >= 0.4:
+                    save_file = os.path.join(part_save_dir, "%s.jpg" % d_idx)
+                    f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % (
+                        offset_x1, offset_y1, offset_x2, offset_y2))
+                    cv2.imwrite(save_file, resized_im)
+                    d_idx += 1
+    f1.close()
+    f2.close()
+    f3.close()
+def model_store_path():
+    return os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))+"/model_store"
+def get_Rnet_data(pnet_model):
+    gen_rnet_data(traindata_store, annotation_file, pnet_model_file = pnet_model, prefix_path = prefix_path, use_cuda = True)
+def assembel_Rnet_data():
+    anno_list = []
+    anno_list.append(rnet_postive_file)
+    anno_list.append(rnet_part_file)
+    anno_list.append(rnet_neg_file)
+    # anno_list.append(pnet_landmark_file)
+    chose_count = assemble_data(imglist_filename_rnet ,anno_list)
+    print("RNet train annotation result file path:%s" % imglist_filename_rnet)
+#-----------------------------------------------------------------------------------------------------------------------------------------------#
+def gen_onet_data(data_dir, anno_file, pnet_model_file, rnet_model_file, prefix_path='', use_cuda=True, vis=False):
+    pnet, rnet, _ = create_mtcnn_net(p_model_path=pnet_model_file, r_model_path=rnet_model_file, use_cuda=use_cuda)
+    mtcnn_detector = MtcnnDetector(pnet=pnet, rnet=rnet, min_face_size=12)
+    imagedb = ImageDB(anno_file,mode="test",prefix_path=prefix_path)
+    imdb = imagedb.load_imdb()
+    image_reader = TestImageLoader(imdb,1,False)
+    all_boxes = list()
+    batch_idx = 0
+    print('size:%d' % image_reader.size)
+    for databatch in image_reader:
+        if batch_idx % 50 == 0:
+            print("%d images done" % batch_idx)
+        im = databatch
+        t = time.time()
+        # pnet detection = [x1, y1, x2, y2, score, reg]
+        p_boxes, p_boxes_align = mtcnn_detector.detect_pnet(im=im)
+        t0 = time.time() - t
+        t = time.time()
+        # rnet detection
+        boxes, boxes_align = mtcnn_detector.detect_rnet(im=im, dets=p_boxes_align)
+        t1 = time.time() - t
+        print('cost time pnet--',t0,'  rnet--',t1)
+        t = time.time()
+        if boxes_align is None:
+            all_boxes.append(np.array([]))
+            batch_idx += 1
+            continue
+        if vis:
+            rgb_im = cv2.cvtColor(np.asarray(im), cv2.COLOR_BGR2RGB)
+            vision.vis_two(rgb_im, boxes, boxes_align)
+        all_boxes.append(boxes_align)
+        batch_idx += 1
+    save_path = './model_store'
+    if not os.path.exists(save_path):
+        os.mkdir(save_path)
+    save_file = os.path.join(save_path, "detections_%d.pkl" % int(time.time()))
+    with open(save_file, 'wb') as f:
+        cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL)
+    gen_onet_sample_data(data_dir,anno_file,save_file,prefix_path)
+def gen_onet_sample_data(data_dir,anno_file,det_boxs_file,prefix):
+    neg_save_dir = os.path.join(data_dir, "48/negative")
+    pos_save_dir = os.path.join(data_dir, "48/positive")
+    part_save_dir = os.path.join(data_dir, "48/part")
+    for dir_path in [neg_save_dir, pos_save_dir, part_save_dir]:
+        if not os.path.exists(dir_path):
+            os.makedirs(dir_path)
+    # load ground truth from annotation file
+    # format of each line: image/path [x1,y1,x2,y2] for each gt_box in this image
+    with open(anno_file, 'r') as f:
+        annotations = f.readlines()
+    image_size = 48
+    net = "onet"
+    im_idx_list = list()
+    gt_boxes_list = list()
+    num_of_images = len(annotations)
+    print("processing %d images in total" % num_of_images)
+    for annotation in annotations:
+        annotation = annotation.strip().split(' ')
+        im_idx = os.path.join(prefix,annotation[0])
+        boxes = list(map(float, annotation[1:]))
+        boxes = np.array(boxes, dtype=np.float32).reshape(-1, 4)
+        im_idx_list.append(im_idx)
+        gt_boxes_list.append(boxes)
+    save_path = './anno_store'
+    if not os.path.exists(save_path):
+        os.makedirs(save_path)
+    f1 = open(os.path.join(save_path, 'pos_%d.txt' % image_size), 'w')
+    f2 = open(os.path.join(save_path, 'neg_%d.txt' % image_size), 'w')
+    f3 = open(os.path.join(save_path, 'part_%d.txt' % image_size), 'w')
+    det_handle = open(det_boxs_file, 'rb')
+    det_boxes = cPickle.load(det_handle)
+    print(len(det_boxes), num_of_images)
+    # assert len(det_boxes) == num_of_images, "incorrect detections or ground truths"
+    # index of neg, pos and part face, used as their image names
+    n_idx = 0
+    p_idx = 0
+    d_idx = 0
+    image_done = 0
+    for im_idx, dets, gts in zip(im_idx_list, det_boxes, gt_boxes_list):
+        if image_done % 100 == 0:
+            print("%d images done" % image_done)
+        image_done += 1
+        if gts.shape[0]==0:
+            continue
+        if dets.shape[0] == 0:
+            continue
+        img = cv2.imread(im_idx)
+        dets = convert_to_square(dets)
+        dets[:, 0:4] = np.round(dets[:, 0:4])
+        for box in dets:
+            x_left, y_top, x_right, y_bottom = box[0:4].astype(int)
+            width = x_right - x_left + 1
+            height = y_bottom - y_top + 1
+            # ignore box that is too small or beyond image border
+            if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1:
+                continue
+            # compute intersection over union(IoU) between current box and all gt boxes
+            Iou = IoU(box, gts)
+            cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :]
+            resized_im = cv2.resize(cropped_im, (image_size, image_size),
+                                    interpolation=cv2.INTER_LINEAR)
+            # save negative images and write label
+            if np.max(Iou) < 0.3:
+                # Iou with all gts must below 0.3
+                save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx)
+                f2.write(save_file + ' 0\n')
+                cv2.imwrite(save_file, resized_im)
+                n_idx += 1
+            else:
+                # find gt_box with the highest iou
+                idx = np.argmax(Iou)
+                assigned_gt = gts[idx]
+                x1, y1, x2, y2 = assigned_gt
+                # compute bbox reg label
+                offset_x1 = (x1 - x_left) / float(width)
+                offset_y1 = (y1 - y_top) / float(height)
+                offset_x2 = (x2 - x_right) / float(width)
+                offset_y2 = (y2 - y_bottom) / float(height)
+                # save positive and part-face images and write labels
+                if np.max(Iou) >= 0.65:
+                    save_file = os.path.join(pos_save_dir, "%s.jpg" % p_idx)
+                    f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % (
+                    offset_x1, offset_y1, offset_x2, offset_y2))
+                    cv2.imwrite(save_file, resized_im)
+                    p_idx += 1
+                elif np.max(Iou) >= 0.4:
+                    save_file = os.path.join(part_save_dir, "%s.jpg" % d_idx)
+                    f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % (
+                    offset_x1, offset_y1, offset_x2, offset_y2))
+                    cv2.imwrite(save_file, resized_im)
+                    d_idx += 1
+    f1.close()
+    f2.close()
+    f3.close()
+def model_store_path():
+    return os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))+"/model_store"
+def get_Onet_data(pnet_model, rnet_model):
+    gen_onet_data(traindata_store, annotation_file, pnet_model_file = pnet_model, rnet_model_file = rnet_model,prefix_path=prefix_path,use_cuda = True, vis = False)
+def assembel_Onet_data():
+    anno_list = []
+    anno_list.append(onet_postive_file)
+    anno_list.append(onet_part_file)
+    anno_list.append(onet_neg_file)
+    anno_list.append(onet_landmark_file)
+    chose_count = assemble_data(imglist_filename_onet ,anno_list)
+    print("ONet train annotation result file path:%s" % imglist_filename_onet)
+def gen_landmark_48(anno_file, data_dir, prefix = ''):
+    size = 48
+    image_id = 0
+    landmark_imgs_save_dir = os.path.join(data_dir,"48/landmark")
+    if not os.path.exists(landmark_imgs_save_dir):
+        os.makedirs(landmark_imgs_save_dir)
+    anno_dir = './anno_store'
+    if not os.path.exists(anno_dir):
+        os.makedirs(anno_dir)
+    landmark_anno_filename = "landmark_48.txt"
+    save_landmark_anno = os.path.join(anno_dir,landmark_anno_filename)
+    # print(save_landmark_anno)
+    # time.sleep(5)
+    f = open(save_landmark_anno, 'w')
+    # dstdir = "train_landmark_few"
+    with open(anno_file, 'r') as f2:
+        annotations = f2.readlines()
+    num = len(annotations)
+    print("%d total images" % num)
+    l_idx =0
+    idx = 0
+    # image_path bbox landmark(5*2)
+    for annotation in annotations:
+        # print imgPath
+        annotation = annotation.strip().split(' ')
+        assert len(annotation)==15,"each line should have 15 element"
+        im_path = os.path.join('./data_set/face_landmark/CNN_FacePoint/train/',annotation[0].replace("\\", "/"))
+        gt_box = list(map(float, annotation[1:5]))
+        # gt_box = [gt_box[0], gt_box[2], gt_box[1], gt_box[3]]
+        gt_box = np.array(gt_box, dtype=np.int32)
+        landmark = list(map(float, annotation[5:]))
+        landmark = np.array(landmark, dtype=np.float)
+        img = cv2.imread(im_path)
+        # print(im_path)
+        assert (img is not None)
+        height, width, channel = img.shape
+        # crop_face = img[gt_box[1]:gt_box[3]+1, gt_box[0]:gt_box[2]+1]
+        # crop_face = cv2.resize(crop_face,(size,size))
+        idx = idx + 1
+        if idx % 100 == 0:
+            print("%d images done, landmark images: %d"%(idx,l_idx))
+        # print(im_path)
+        # print(gt_box)
+        x1, x2, y1, y2 = gt_box
+        gt_box[1] = y1
+        gt_box[2] = x2
+        # time.sleep(5)
+        # gt's width
+        w = x2 - x1 + 1
+        # gt's height
+        h = y2 - y1 + 1
+        if max(w, h) < 40 or x1 < 0 or y1 < 0:
+            continue
+        # random shift
+        for i in range(10):
+            bbox_size = np.random.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))
+            delta_x = np.random.randint(-w * 0.2, w * 0.2)
+            delta_y = np.random.randint(-h * 0.2, h * 0.2)
+            nx1 = max(x1 + w / 2 - bbox_size / 2 + delta_x, 0)
+            ny1 = max(y1 + h / 2 - bbox_size / 2 + delta_y, 0)
+            nx2 = nx1 + bbox_size
+            ny2 = ny1 + bbox_size
+            if nx2 > width or ny2 > height:
+                continue
+            crop_box = np.array([nx1, ny1, nx2, ny2])
+            cropped_im = img[int(ny1):int(ny2) + 1, int(nx1):int(nx2) + 1, :]
+            resized_im = cv2.resize(cropped_im, (size, size),interpolation=cv2.INTER_LINEAR)
+            offset_x1 = (x1 - nx1) / float(bbox_size)
+            offset_y1 = (y1 - ny1) / float(bbox_size)
+            offset_x2 = (x2 - nx2) / float(bbox_size)
+            offset_y2 = (y2 - ny2) / float(bbox_size)
+            offset_left_eye_x = (landmark[0] - nx1) / float(bbox_size)
+            offset_left_eye_y = (landmark[1] - ny1) / float(bbox_size)
+            offset_right_eye_x = (landmark[2] - nx1) / float(bbox_size)
+            offset_right_eye_y = (landmark[3] - ny1) / float(bbox_size)
+            offset_nose_x = (landmark[4] - nx1) / float(bbox_size)
+            offset_nose_y = (landmark[5] - ny1) / float(bbox_size)
+            offset_left_mouth_x = (landmark[6] - nx1) / float(bbox_size)
+            offset_left_mouth_y = (landmark[7] - ny1) / float(bbox_size)
+            offset_right_mouth_x = (landmark[8] - nx1) / float(bbox_size)
+            offset_right_mouth_y = (landmark[9] - ny1) / float(bbox_size)
+            # cal iou
+            iou = IoU(crop_box.astype(np.float), np.expand_dims(gt_box.astype(np.float), 0))
+            # print(iou)
+            if iou > 0.65:
+                save_file = os.path.join(landmark_imgs_save_dir, "%s.jpg" % l_idx)
+                cv2.imwrite(save_file, resized_im)
+                f.write(save_file + ' -2 %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f \n' % \
+                (offset_x1, offset_y1, offset_x2, offset_y2, \
+                offset_left_eye_x,offset_left_eye_y,offset_right_eye_x,offset_right_eye_y,offset_nose_x,offset_nose_y,offset_left_mouth_x,offset_left_mouth_y,offset_right_mouth_x,offset_right_mouth_y))
+                # print(save_file)
+                # print(save_landmark_anno)
+                l_idx += 1
+    f.close()
+def parse_args():
+    parser = argparse.ArgumentParser(description='Get data',
+                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument('--net', dest='net', help='which net to show', type=str)
+    parser.add_argument('--pnet_path', default="./model_store/pnet_epoch_20.pt",help='path to pnet model', type=str)
+    parser.add_argument('--rnet_path', default="./model_store/rnet_epoch_20.pt",help='path to rnet model', type=str)
+    parser.add_argument('--use_cuda', default=True,help='use cuda', type=bool)
+    args = parser.parse_args()
+    return args
+#-----------------------------------------------------------------------------------------------------------------------------------------------#
+if __name__ == '__main__':
+    args = parse_args()
+    dir = 'anno_store'
+    if not os.path.exists(dir):
+        os.makedirs(dir)
+    if args.net == "pnet":
+        wider_face(txt_from_path, anno_file)
+        get_Pnet_data()
+        assembel_Pnet_data()
+    elif args.net == "rnet":
+        get_Rnet_data(args.pnet_path)
+        assembel_Rnet_data()
+    elif args.net == "onet":
+        get_Onet_data(args.pnet_path, args.rnet_path)
+        gen_landmark_48(annotation_file_lm, traindata_store, prefix_path_lm)
+        assembel_Onet_data()

img/mid.png ADDED Viewed

img/onet.png ADDED Viewed

img/pnet.png ADDED Viewed

img/result.png ADDED Viewed

img/rnet.png ADDED Viewed

model_store/onet_epoch_20.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:53e8fe6d59c0b3cd75ae24f37756e056e05b9fa555cd9e442543aef54cc5f887
+size 903910

model_store/pnet_epoch_20.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e818bafbe694390fba4cf59cad9d67a04ed8fb9297e5b4032c3d2af3832e5365
+size 32056

model_store/rnet_epoch_20.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cfe5d5abf979cb3d7eda838d9d6c8e1b582e4a53a1d20e9b6ff54953ed3ba042
+size 245871

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+matplotlib==3.5.0
+matplotlib-inline==0.1.3
+numpy==1.21.4
+opencv-python==4.4.0.42
+opencv-python-headless==4.6.0.66
+Pillow==9.1.1
+scikit-image==0.19.3
+torch==1.13.0+cu116
+torchaudio==0.13.0+cu116
+torchvision==0.14.0+cu116

test.py ADDED Viewed

	@@ -0,0 +1,84 @@

+import cv2
+from utils.detect import create_mtcnn_net, MtcnnDetector
+from utils.vision import vis_face
+import argparse
+MIN_FACE_SIZE = 3
+def parse_args():
+    parser = argparse.ArgumentParser(description='Test MTCNN',
+                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument('--net', default='onet', help='which net to show', type=str)
+    parser.add_argument('--pnet_path', default="./model_store/pnet_epoch_20.pt",help='path to pnet model', type=str)
+    parser.add_argument('--rnet_path', default="./model_store/rnet_epoch_20.pt",help='path to rnet model', type=str)
+    parser.add_argument('--onet_path', default="./model_store/onet_epoch_20.pt",help='path to onet model', type=str)
+    parser.add_argument('--path', default="./img/mid.png",help='path to image', type=str)
+    parser.add_argument('--min_face_size', default=MIN_FACE_SIZE,help='min face size', type=int)
+    parser.add_argument('--use_cuda', default=False,help='use cuda', type=bool)
+    parser.add_argument('--thresh', default='[0.1, 0.1, 0.1]',help='thresh', type=str)
+    parser.add_argument('--save_name', default="result.jpg",help='save name', type=str)
+    parser.add_argument('--input_mode', default=1,help='image or video', type=int)
+    args = parser.parse_args()
+    return args
+if __name__ == '__main__':
+    args = parse_args()
+    thresh = [float(i) for i in (args.thresh).split('[')[1].split(']')[0].split(',')]
+    pnet, rnet, onet = create_mtcnn_net(p_model_path=args.pnet_path, r_model_path=args.rnet_path,o_model_path=args.onet_path, use_cuda=args.use_cuda)
+    mtcnn_detector = MtcnnDetector(pnet=pnet, rnet=rnet, onet=onet, min_face_size=args.min_face_size,threshold=thresh)
+    if args.input_mode == 1:
+        img = cv2.imread(args.path)
+        img_bg = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        p_bboxs, r_bboxs, bboxs, landmarks = mtcnn_detector.detect_face(img)
+        # print box_align
+        save_name = args.save_name
+        if args.net == 'pnet':
+            vis_face(img_bg, p_bboxs, landmarks, MIN_FACE_SIZE, save_name)
+        elif args.net == 'rnet':
+            vis_face(img_bg, r_bboxs, landmarks, MIN_FACE_SIZE, save_name)
+        elif args.net == 'onet':
+            vis_face(img_bg, bboxs, landmarks, MIN_FACE_SIZE, save_name)
+    elif args.input_mode == 0:
+        cap=cv2.VideoCapture(0)
+        fourcc = cv2.VideoWriter_fourcc(*'XVID')
+        out = cv2.VideoWriter('out.mp4' ,fourcc,10,(640,480))
+        while True:
+                t1=cv2.getTickCount()
+                ret,frame = cap.read()
+                if ret == True:
+                    boxes_c,landmarks = mtcnn_detector.detect_face(frame)
+                    t2=cv2.getTickCount()
+                    t=(t2-t1)/cv2.getTickFrequency()
+                    fps=1.0/t
+                    for i in range(boxes_c.shape[0]):
+                        bbox = boxes_c[i, :4]
+                        score = boxes_c[i, 4]
+                        corpbbox = [int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])]
+                        #画人脸框
+                        cv2.rectangle(frame, (corpbbox[0], corpbbox[1]),
+                            (corpbbox[2], corpbbox[3]), (255, 0, 0), 1)
+                        #画置信度
+                        cv2.putText(frame, '{:.2f}'.format(score),
+                                    (corpbbox[0], corpbbox[1] - 2),
+                                    cv2.FONT_HERSHEY_SIMPLEX,
+                                    0.5,(0, 0, 255), 2)
+                        #画fps值
+                    cv2.putText(frame, '{:.4f}'.format(t) + " " + '{:.3f}'.format(fps), (10, 20),
+                                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 255), 2)
+                    #画关键点
+                    for i in range(landmarks.shape[0]):
+                        for j in range(len(landmarks[i])//2):
+                            cv2.circle(frame, (int(landmarks[i][2*j]),int(int(landmarks[i][2*j+1]))), 2, (0,0,255))
+                    a = out.write(frame)
+                    cv2.imshow("result", frame)
+                    if cv2.waitKey(1) & 0xFF == ord('q'):
+                        break
+                else:
+                    break
+        cap.release()
+        out.release()
+        cv2.destroyAllWindows()

test.sh ADDED Viewed

	@@ -0,0 +1,4 @@

+python test.py --net=pnet --min_face_size=1 --pnet_path=./model_store/pnet_epoch_20.pt --rnet_path=./model_store/rnet_epoch_20.pt --onet_path=./model_store/onet_epoch_20.pt --save_name=pnet
+python test.py --net=rnet --min_face_size=1 --pnet_path=./model_store/pnet_epoch_20.pt --rnet_path=./model_store/rnet_epoch_20.pt --onet_path=./model_store/onet_epoch_20.pt --save_name=rnet
+python test.py --net=onet --min_face_size=1 --pnet_path=./model_store/pnet_epoch_20.pt --rnet_path=./model_store/rnet_epoch_20.pt --onet_path=./model_store/onet_epoch_20.pt --save_name=onet
+echo "Testing finished!"

train.out ADDED Viewed

The diff for this file is too large to render. See raw diff

train.py ADDED Viewed

	@@ -0,0 +1,351 @@

+from utils.dataloader import TrainImageReader,convert_image_to_tensor,ImageDB
+import datetime
+import os
+from utils.models  import PNet,RNet,ONet,LossFn
+import torch
+#from torch.autograd import Variable 新版本中已弃用
+import utils.config as config
+import argparse
+import sys
+sys.path.append(os.getcwd())
+import numpy as np
+def compute_accuracy(prob_cls, gt_cls):
+    prob_cls = torch.squeeze(prob_cls)
+    gt_cls = torch.squeeze(gt_cls)
+    #we only need the detection which >= 0
+    mask = torch.ge(gt_cls,0)
+    #get valid element
+    valid_gt_cls = torch.masked_select(gt_cls,mask)
+    valid_prob_cls = torch.masked_select(prob_cls,mask)
+    size = min(valid_gt_cls.size()[0], valid_prob_cls.size()[0])
+    prob_ones = torch.ge(valid_prob_cls,0.6).float()
+    right_ones = torch.eq(prob_ones,valid_gt_cls).float()
+    ## if size == 0 meaning that your gt_labels are all negative, landmark or part
+    return torch.div(torch.mul(torch.sum(right_ones),float(1.0)),float(size))  ## divided by zero meaning that your gt_labels are all negative, landmark or part
+def train_pnet(model_store_path, end_epoch,imdb,
+              batch_size,frequent=10,base_lr=0.01,lr_epoch_decay=[9],use_cuda=True,load=''):
+    #create lr_list
+    lr_epoch_decay.append(end_epoch+1)
+    lr_list = np.zeros(end_epoch)
+    lr_t = base_lr
+    for i in range(len(lr_epoch_decay)):
+        if i==0:
+            lr_list[0:lr_epoch_decay[i]-1]=lr_t
+        else:
+            lr_list[lr_epoch_decay[i-1]-1:lr_epoch_decay[i]-1]=lr_t
+        lr_t*=0.1
+    if not os.path.exists(model_store_path):
+        os.makedirs(model_store_path)
+    lossfn = LossFn()
+    net = PNet(is_train=True, use_cuda=use_cuda)
+    if load!='':
+        net.load_state_dict(torch.load(load))
+        print('model loaded',load)
+    net.train()
+    if use_cuda:
+        net.cuda()
+    optimizer = torch.optim.Adam(net.parameters(), lr=lr_list[0])
+    #optimizer = torch.optim.SGD(net.parameters(), lr=lr_list[0])
+    train_data=TrainImageReader(imdb,12,batch_size,shuffle=True)
+    #frequent = 10
+    for cur_epoch in range(1,end_epoch+1):
+        train_data.reset() # shuffle
+        for param in optimizer.param_groups:
+            param['lr'] = lr_list[cur_epoch-1]
+        for batch_idx,(image,(gt_label,gt_bbox,gt_landmark))in enumerate(train_data):
+            im_tensor = [ convert_image_to_tensor(image[i,:,:,:]) for i in range(image.shape[0]) ]
+            im_tensor = torch.stack(im_tensor)
+            im_tensor.requires_grad = True
+            gt_label = torch.from_numpy(gt_label).float()
+            gt_label.requires_grad = True
+            gt_bbox = torch.from_numpy(gt_bbox).float()
+            gt_bbox.requires_grad = True
+            # gt_landmark = Variable(torch.from_numpy(gt_landmark).float())
+            if use_cuda:
+                im_tensor = im_tensor.cuda()
+                gt_label = gt_label.cuda()
+                gt_bbox = gt_bbox.cuda()
+                # gt_landmark = gt_landmark.cuda()
+            cls_pred, box_offset_pred = net(im_tensor)
+            # all_loss, cls_loss, offset_loss = lossfn.loss(gt_label=label_y,gt_offset=bbox_y, pred_label=cls_pred, pred_offset=box_offset_pred)
+            cls_loss = lossfn.cls_loss(gt_label,cls_pred)
+            box_offset_loss = lossfn.box_loss(gt_label,gt_bbox,box_offset_pred)
+            # landmark_loss = lossfn.landmark_loss(gt_label,gt_landmark,landmark_offset_pred)
+            all_loss = cls_loss*1.0+box_offset_loss*0.5
+            if batch_idx %frequent==0:
+                accuracy=compute_accuracy(cls_pred,gt_label)
+                show1 = accuracy.data.cpu().numpy()
+                show2 = cls_loss.data.cpu().numpy()
+                show3 = box_offset_loss.data.cpu().numpy()
+                # show4 = landmark_loss.data.cpu().numpy()
+                show5 = all_loss.data.cpu().numpy()
+                print("%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(),cur_epoch,batch_idx, show1,show2,show3,show5,lr_list[cur_epoch-1]))
+            optimizer.zero_grad()
+            all_loss.backward()
+            optimizer.step()
+        torch.save(net.state_dict(), os.path.join(model_store_path,"pnet_epoch_%d.pt" % cur_epoch))
+        torch.save(net, os.path.join(model_store_path,"pnet_epoch_model_%d.pkl" % cur_epoch))
+def train_rnet(model_store_path, end_epoch,imdb,
+              batch_size,frequent=50,base_lr=0.01,lr_epoch_decay=[9],use_cuda=True,load=''):
+    #create lr_list
+    lr_epoch_decay.append(end_epoch+1)
+    lr_list = np.zeros(end_epoch)
+    lr_t = base_lr
+    for i in range(len(lr_epoch_decay)):
+        if i==0:
+            lr_list[0:lr_epoch_decay[i]-1]=lr_t
+        else:
+            lr_list[lr_epoch_decay[i-1]-1:lr_epoch_decay[i]-1]=lr_t
+        lr_t*=0.1
+    #print(lr_list)
+    if not os.path.exists(model_store_path):
+        os.makedirs(model_store_path)
+    lossfn = LossFn()
+    net = RNet(is_train=True, use_cuda=use_cuda)
+    net.train()
+    if load!='':
+        net.load_state_dict(torch.load(load))
+        print('model loaded',load)
+    if use_cuda:
+        net.cuda()
+    optimizer = torch.optim.Adam(net.parameters(), lr=base_lr)
+    train_data=TrainImageReader(imdb,24,batch_size,shuffle=True)
+    for cur_epoch in range(1,end_epoch+1):
+        train_data.reset()
+        for param in optimizer.param_groups:
+            param['lr'] = lr_list[cur_epoch-1]
+        for batch_idx,(image,(gt_label,gt_bbox,gt_landmark))in enumerate(train_data):
+            im_tensor = [ convert_image_to_tensor(image[i,:,:,:]) for i in range(image.shape[0]) ]
+            im_tensor = torch.stack(im_tensor)
+            im_tensor.requires_grad = True
+            gt_label = torch.from_numpy(gt_label).float()
+            gt_label.requires_grad = True
+            gt_bbox = torch.from_numpy(gt_bbox).float()
+            gt_bbox.requires_grad = True
+            gt_landmark = torch.from_numpy(gt_landmark).float()
+            gt_landmark.requires_grad = True
+            if use_cuda:
+                im_tensor = im_tensor.cuda()
+                gt_label = gt_label.cuda()
+                gt_bbox = gt_bbox.cuda()
+                gt_landmark = gt_landmark.cuda()
+            cls_pred, box_offset_pred = net(im_tensor)
+            # all_loss, cls_loss, offset_loss = lossfn.loss(gt_label=label_y,gt_offset=bbox_y, pred_label=cls_pred, pred_offset=box_offset_pred)
+            cls_loss = lossfn.cls_loss(gt_label,cls_pred)
+            box_offset_loss = lossfn.box_loss(gt_label,gt_bbox,box_offset_pred)
+            # landmark_loss = lossfn.landmark_loss(gt_label,gt_landmark,landmark_offset_pred)
+            all_loss = cls_loss*1.0+box_offset_loss*0.5
+            if batch_idx%frequent==0:
+                accuracy=compute_accuracy(cls_pred,gt_label)
+                show1 = accuracy.data.cpu().numpy()
+                show2 = cls_loss.data.cpu().numpy()
+                show3 = box_offset_loss.data.cpu().numpy()
+                # show4 = landmark_loss.data.cpu().numpy()
+                show5 = all_loss.data.cpu().numpy()
+                print("%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(), cur_epoch, batch_idx, show1, show2, show3, show5, lr_list[cur_epoch-1]))
+            optimizer.zero_grad()
+            all_loss.backward()
+            optimizer.step()
+        torch.save(net.state_dict(), os.path.join(model_store_path,"rnet_epoch_%d.pt" % cur_epoch))
+        torch.save(net, os.path.join(model_store_path,"rnet_epoch_model_%d.pkl" % cur_epoch))
+def train_onet(model_store_path, end_epoch,imdb,
+              batch_size,frequent=50,base_lr=0.01,lr_epoch_decay=[9],use_cuda=True,load=''):
+    #create lr_list
+    lr_epoch_decay.append(end_epoch+1)
+    lr_list = np.zeros(end_epoch)
+    lr_t = base_lr
+    for i in range(len(lr_epoch_decay)):
+        if i==0:
+            lr_list[0:lr_epoch_decay[i]-1]=lr_t
+        else:
+            lr_list[lr_epoch_decay[i-1]-1:lr_epoch_decay[i]-1]=lr_t
+        lr_t*=0.1
+    #print(lr_list)
+    if not os.path.exists(model_store_path):
+        os.makedirs(model_store_path)
+    lossfn = LossFn()
+    net = ONet(is_train=True)
+    if load!='':
+        net.load_state_dict(torch.load(load))
+        print('model loaded',load)
+    net.train()
+    #print(use_cuda)
+    if use_cuda:
+        net.cuda()
+    optimizer = torch.optim.Adam(net.parameters(), lr=base_lr)
+    train_data=TrainImageReader(imdb,48,batch_size,shuffle=True)
+    for cur_epoch in range(1,end_epoch+1):
+        train_data.reset()
+        for param in optimizer.param_groups:
+            param['lr'] = lr_list[cur_epoch-1]
+        for batch_idx,(image,(gt_label,gt_bbox,gt_landmark))in enumerate(train_data):
+            # print("batch id {0}".format(batch_idx))
+            im_tensor = [ convert_image_to_tensor(image[i,:,:,:]) for i in range(image.shape[0]) ]
+            im_tensor = torch.stack(im_tensor)
+            im_tensor.requires_grad = True
+            gt_label = torch.from_numpy(gt_label).float()
+            gt_label.requires_grad = True
+            gt_bbox = torch.from_numpy(gt_bbox).float()
+            gt_bbox.requires_grad = True
+            gt_landmark = torch.from_numpy(gt_landmark).float()
+            gt_landmark.requires_grad = True
+            if use_cuda:
+                im_tensor = im_tensor.cuda()
+                gt_label = gt_label.cuda()
+                gt_bbox = gt_bbox.cuda()
+                gt_landmark = gt_landmark.cuda()
+            cls_pred, box_offset_pred, landmark_offset_pred = net(im_tensor)
+            # all_loss, cls_loss, offset_loss = lossfn.loss(gt_label=label_y,gt_offset=bbox_y, pred_label=cls_pred, pred_offset=box_offset_pred)
+            cls_loss = lossfn.cls_loss(gt_label,cls_pred)
+            box_offset_loss = lossfn.box_loss(gt_label,gt_bbox,box_offset_pred)
+            landmark_loss = lossfn.landmark_loss(gt_label,gt_landmark,landmark_offset_pred)
+            all_loss = cls_loss*0.8+box_offset_loss*0.6+landmark_loss*1.5
+            if batch_idx%frequent==0:
+                accuracy=compute_accuracy(cls_pred,gt_label)
+                show1 = accuracy.data.cpu().numpy()
+                show2 = cls_loss.data.cpu().numpy()
+                show3 = box_offset_loss.data.cpu().numpy()
+                show4 = landmark_loss.data.cpu().numpy()
+                show5 = all_loss.data.cpu().numpy()
+                print("%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, landmark loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(),cur_epoch,batch_idx, show1,show2,show3,show4,show5,base_lr))
+                #print("%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(),cur_epoch,batch_idx, show1,show2,show3,show5,lr_list[cur_epoch-1]))
+            optimizer.zero_grad()
+            all_loss.backward()
+            optimizer.step()
+        torch.save(net.state_dict(), os.path.join(model_store_path,"onet_epoch_%d.pt" % cur_epoch))
+        torch.save(net, os.path.join(model_store_path,"onet_epoch_model_%d.pkl" % cur_epoch))
+def parse_args():
+    parser = argparse.ArgumentParser(description='Train MTCNN',
+                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument('--net', dest='net', help='which net to train', type=str)
+    parser.add_argument('--anno_file', dest='annotation_file', help='training data annotation file', type=str)
+    parser.add_argument('--model_path', dest='model_store_path', help='training model store directory',
+                        default=config.MODEL_STORE_DIR, type=str)
+    parser.add_argument('--end_epoch', dest='end_epoch', help='end epoch of training',
+                        default=config.END_EPOCH, type=int)
+    parser.add_argument('--frequent', dest='frequent', help='frequency of logging',
+                        default=200, type=int)
+    parser.add_argument('--lr', dest='lr', help='learning rate',
+                        default=config.TRAIN_LR, type=float)
+    parser.add_argument('--batch_size', dest='batch_size', help='train batch size',
+                        default=config.TRAIN_BATCH_SIZE, type=int)
+    parser.add_argument('--gpu', dest='use_cuda', help='train with gpu',
+                        default=config.USE_CUDA, type=bool)
+    parser.add_argument('--load', dest='load', help='load model', type=str)
+    args = parser.parse_args()
+    return args
+def train_net(annotation_file, model_store_path,
+                end_epoch=16, frequent=200, lr=0.01,lr_epoch_decay=[9],
+                 batch_size=128, use_cuda=False,load='',net='pnet'):
+    if net=='pnet':
+        annotation_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_TRAIN_IMGLIST_FILENAME)
+    elif net=='rnet':
+        annotation_file = os.path.join(config.ANNO_STORE_DIR,config.RNET_TRAIN_IMGLIST_FILENAME)
+    elif net=='onet':
+        annotation_file = os.path.join(config.ANNO_STORE_DIR,config.ONET_TRAIN_IMGLIST_FILENAME)
+    imagedb = ImageDB(annotation_file)
+    gt_imdb = imagedb.load_imdb()
+    print('DATASIZE',len(gt_imdb))
+    gt_imdb = imagedb.append_flipped_images(gt_imdb)
+    print('FLIP DATASIZE',len(gt_imdb))
+    if net=="pnet":
+        print("Training Pnet:")
+        train_pnet(model_store_path=model_store_path, end_epoch=end_epoch, imdb=gt_imdb, batch_size=batch_size, frequent=frequent, base_lr=lr,lr_epoch_decay=lr_epoch_decay, use_cuda=use_cuda,load=load)
+    elif net=="rnet":
+        print("Training Rnet:")
+        train_rnet(model_store_path=model_store_path, end_epoch=end_epoch, imdb=gt_imdb, batch_size=batch_size, frequent=frequent, base_lr=lr,lr_epoch_decay=lr_epoch_decay, use_cuda=use_cuda,load=load)
+    elif net=="onet":
+        print("Training Onet:")
+        train_onet(model_store_path=model_store_path, end_epoch=end_epoch, imdb=gt_imdb, batch_size=batch_size, frequent=frequent, base_lr=lr,lr_epoch_decay=lr_epoch_decay, use_cuda=use_cuda,load=load)
+if __name__ == '__main__':
+    args = parse_args()
+    lr_epoch_decay = [9]
+    train_net(annotation_file=args.annotation_file, model_store_path=args.model_store_path,
+                end_epoch=args.end_epoch, frequent=args.frequent, lr=args.lr,lr_epoch_decay=lr_epoch_decay,batch_size=args.batch_size, use_cuda=args.use_cuda,load=args.load,net=args.net)

train.sh ADDED Viewed

	@@ -0,0 +1,7 @@

+python get_data.py --net=pnet
+python train.py --net=pnet
+python get_data.py --net=rnet --pnet_path=./model_store/pnet_epoch_20.pt
+python train.py --net=rnet
+python get_data.py --net=onet --pnet_path=./model_store/pnet_epoch_20.pt --rnet_path=./model_store/rnet_epoch_20.pt
+python train.py --net=onet
+echo "Training finished!"

utils/config.py ADDED Viewed

	@@ -0,0 +1,42 @@

+import os
+'''使用示例代码的原始参数'''
+MODEL_STORE_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))+"/model_store"
+ANNO_STORE_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))+"/anno_store"
+LOG_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))+"/log"
+USE_CUDA = True
+TRAIN_BATCH_SIZE = 512
+TRAIN_LR = 0.01
+END_EPOCH = 20
+PNET_POSTIVE_ANNO_FILENAME = "pos_12.txt"
+PNET_NEGATIVE_ANNO_FILENAME = "neg_12.txt"
+PNET_PART_ANNO_FILENAME = "part_12.txt"
+PNET_LANDMARK_ANNO_FILENAME = "landmark_12.txt"
+RNET_POSTIVE_ANNO_FILENAME = "pos_24.txt"
+RNET_NEGATIVE_ANNO_FILENAME = "neg_24.txt"
+RNET_PART_ANNO_FILENAME = "part_24.txt"
+RNET_LANDMARK_ANNO_FILENAME = "landmark_24.txt"
+ONET_POSTIVE_ANNO_FILENAME = "pos_48.txt"
+ONET_NEGATIVE_ANNO_FILENAME = "neg_48.txt"
+ONET_PART_ANNO_FILENAME = "part_48.txt"
+ONET_LANDMARK_ANNO_FILENAME = "landmark_48.txt"
+PNET_TRAIN_IMGLIST_FILENAME = "imglist_anno_12.txt"
+RNET_TRAIN_IMGLIST_FILENAME = "imglist_anno_24.txt"
+ONET_TRAIN_IMGLIST_FILENAME = "imglist_anno_48.txt"

utils/dataloader.py ADDED Viewed

	@@ -0,0 +1,347 @@

+import torchvision.transforms as transforms
+import numpy as np
+import os
+import cv2
+def convert_image_to_tensor(image):
+    """convert an image to pytorch tensor
+        Parameters:
+        ----------
+        image: numpy array , h * w * c
+        Returns:
+        -------
+        image_tensor: pytorch.FloatTensor, c * h * w
+        """
+    transform = transforms.ToTensor()
+    return transform(image)
+def convert_chwTensor_to_hwcNumpy(tensor):
+    """convert a group images pytorch tensor(count * c * h * w) to numpy array images(count * h * w * c)
+            Parameters:
+            ----------
+            tensor: numpy array , count * c * h * w
+            Returns:
+            -------
+            numpy array images: count * h * w * c
+            """
+    return np.transpose(tensor.detach().numpy(), (0,2,3,1))
+class ImageDB(object):
+    def __init__(self, image_annotation_file, prefix_path='', mode='train'):
+        self.prefix_path = prefix_path
+        self.image_annotation_file = image_annotation_file
+        self.classes = ['__background__', 'face']
+        self.num_classes = 2
+        self.image_set_index = self.load_image_set_index()
+        self.num_images = len(self.image_set_index)
+        self.mode = mode
+    def load_image_set_index(self):
+        """Get image index
+        Parameters:
+        ----------
+        Returns:
+        -------
+        image_set_index: str
+            relative path of image
+        """
+        assert os.path.exists(self.image_annotation_file), 'Path does not exist: {}'.format(self.image_annotation_file)
+        with open(self.image_annotation_file, 'r') as f:
+            image_set_index = [x.strip().split(' ')[0] for x in f.readlines()]
+        return image_set_index
+    def load_imdb(self):
+        """Get and save ground truth image database
+        Parameters:
+        ----------
+        Returns:
+        -------
+        gt_imdb: dict
+            image database with annotations
+        """
+        gt_imdb = self.load_annotations()
+        return gt_imdb
+    def real_image_path(self, index):
+        """Given image index, return full path
+        Parameters:
+        ----------
+        index: str
+            relative path of image
+        Returns:
+        -------
+        image_file: str
+            full path of image
+        """
+        index = index.replace("\\", "/")
+        if not os.path.exists(index):
+            image_file = os.path.join(self.prefix_path, index)
+        else:
+            image_file=index
+        if not image_file.endswith('.jpg'):
+            image_file = image_file + '.jpg'
+        assert os.path.exists(image_file), 'Path does not exist: {}'.format(image_file)
+        return image_file
+    def load_annotations(self,annotion_type=1):
+        """Load annotations
+        Parameters:
+        ----------
+        annotion_type: int
+                      0:dsadsa
+                      1:dsadsa
+        Returns:
+        -------
+        imdb: dict
+            image database with annotations
+        """
+        assert os.path.exists(self.image_annotation_file), 'annotations not found at {}'.format(self.image_annotation_file)
+        with open(self.image_annotation_file, 'r') as f:
+            annotations = f.readlines()
+        imdb = []
+        for i in range(self.num_images):
+            annotation = annotations[i].strip().split(' ')
+            index = annotation[0]
+            im_path = self.real_image_path(index)
+            imdb_ = dict()
+            imdb_['image'] = im_path
+            if self.mode == 'test':
+                pass
+            else:
+                label = annotation[1]
+                imdb_['label'] = int(label)
+                imdb_['flipped'] = False
+                imdb_['bbox_target'] = np.zeros((4,))
+                imdb_['landmark_target'] = np.zeros((10,))
+                if len(annotation[2:])==4:
+                    bbox_target = annotation[2:6]
+                    imdb_['bbox_target'] = np.array(bbox_target).astype(float)
+                if len(annotation[2:])==14:
+                    bbox_target = annotation[2:6]
+                    imdb_['bbox_target'] = np.array(bbox_target).astype(float)
+                    landmark = annotation[6:]
+                    imdb_['landmark_target'] = np.array(landmark).astype(float)
+            imdb.append(imdb_)
+        return imdb
+    def append_flipped_images(self, imdb):
+        """append flipped images to imdb
+        Parameters:
+        ----------
+        imdb: imdb
+            image database
+        Returns:
+        -------
+        imdb: dict
+            image database with flipped image annotations added
+        """
+        print('append flipped images to imdb', len(imdb))
+        for i in range(len(imdb)):
+            imdb_ = imdb[i]
+            m_bbox = imdb_['bbox_target'].copy()
+            m_bbox[0], m_bbox[2] = -m_bbox[2], -m_bbox[0]
+            landmark_ = imdb_['landmark_target'].copy()
+            landmark_ = landmark_.reshape((5, 2))
+            landmark_ = np.asarray([(1 - x, y) for (x, y) in landmark_])
+            landmark_[[0, 1]] = landmark_[[1, 0]]
+            landmark_[[3, 4]] = landmark_[[4, 3]]
+            item = {'image': imdb_['image'],
+                     'label': imdb_['label'],
+                     'bbox_target': m_bbox,
+                     'landmark_target': landmark_.reshape((10)),
+                     'flipped': True}
+            imdb.append(item)
+        self.image_set_index *= 2
+        return imdb
+class TrainImageReader:
+    def __init__(self, imdb, im_size, batch_size=128, shuffle=False):
+        self.imdb = imdb
+        self.batch_size = batch_size
+        self.im_size = im_size
+        self.shuffle = shuffle
+        self.cur = 0
+        self.size = len(imdb)
+        self.index = np.arange(self.size)
+        self.num_classes = 2
+        self.batch = None
+        self.data = None
+        self.label = None
+        self.label_names= ['label', 'bbox_target', 'landmark_target']
+        self.reset()
+        self.get_batch()
+    def reset(self):
+        self.cur = 0
+        if self.shuffle:
+            np.random.shuffle(self.index)
+    def iter_next(self):
+        return self.cur + self.batch_size <= self.size
+    def __iter__(self):
+        return self
+    def __next__(self):
+        return self.next()
+    def next(self):
+        if self.iter_next():
+            self.get_batch()
+            self.cur += self.batch_size
+            return self.data,self.label
+        else:
+            raise StopIteration
+    def getindex(self):
+        return self.cur / self.batch_size
+    def getpad(self):
+        if self.cur + self.batch_size > self.size:
+            return self.cur + self.batch_size - self.size
+        else:
+            return 0
+    def get_batch(self):
+        cur_from = self.cur
+        cur_to = min(cur_from + self.batch_size, self.size)
+        imdb = [self.imdb[self.index[i]] for i in range(cur_from, cur_to)]
+        data, label = get_minibatch(imdb)
+        self.data = data['data']
+        self.label = [label[name] for name in self.label_names]
+class TestImageLoader:
+    def __init__(self, imdb, batch_size=1, shuffle=False):
+        self.imdb = imdb
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.size = len(imdb)
+        self.index = np.arange(self.size)
+        self.cur = 0
+        self.data = None
+        self.label = None
+        self.reset()
+        self.get_batch()
+    def reset(self):
+        self.cur = 0
+        if self.shuffle:
+            np.random.shuffle(self.index)
+    def iter_next(self):
+        return self.cur + self.batch_size <= self.size
+    def __iter__(self):
+        return self
+    def __next__(self):
+        return self.next()
+    def next(self):
+        if self.iter_next():
+            self.get_batch()
+            self.cur += self.batch_size
+            return self.data
+        else:
+            raise StopIteration
+    def getindex(self):
+        return self.cur / self.batch_size
+    def getpad(self):
+        if self.cur + self.batch_size > self.size:
+            return self.cur + self.batch_size - self.size
+        else:
+            return 0
+    def get_batch(self):
+        cur_from = self.cur
+        cur_to = min(cur_from + self.batch_size, self.size)
+        imdb = [self.imdb[self.index[i]] for i in range(cur_from, cur_to)]
+        data= get_testbatch(imdb)
+        self.data=data['data']
+def get_minibatch(imdb):
+    # im_size: 12, 24 or 48
+    num_images = len(imdb)
+    processed_ims = list()
+    cls_label = list()
+    bbox_reg_target = list()
+    landmark_reg_target = list()
+    for i in range(num_images):
+        im = cv2.imread(imdb[i]['image'])
+        if imdb[i]['flipped']:
+            im = im[:, ::-1, :]
+        cls = imdb[i]['label']
+        bbox_target = imdb[i]['bbox_target']
+        landmark = imdb[i]['landmark_target']
+        processed_ims.append(im)
+        cls_label.append(cls)
+        bbox_reg_target.append(bbox_target)
+        landmark_reg_target.append(landmark)
+    im_array = np.asarray(processed_ims)
+    label_array = np.array(cls_label)
+    bbox_target_array = np.vstack(bbox_reg_target)
+    landmark_target_array = np.vstack(landmark_reg_target)
+    data = {'data': im_array}
+    label = {'label': label_array,
+             'bbox_target': bbox_target_array,
+             'landmark_target': landmark_target_array
+             }
+    return data, label
+def get_testbatch(imdb):
+    assert len(imdb) == 1, "Single batch only"
+    im = cv2.imread(imdb[0]['image'])
+    data = {'data': im}
+    return data

utils/detect.py ADDED Viewed

	@@ -0,0 +1,758 @@

+import cv2
+import time
+import numpy as np
+import torch
+from utils.models import PNet,RNet,ONet
+import utils.tool as utils
+import utils.dataloader as image_tools
+def create_mtcnn_net(p_model_path=None, r_model_path=None, o_model_path=None, use_cuda=True):
+    pnet, rnet, onet = None, None, None
+    if p_model_path is not None:
+        pnet = PNet(use_cuda=use_cuda)
+        if(use_cuda):
+            print('p_model_path:{0}'.format(p_model_path))
+            pnet.load_state_dict(torch.load(p_model_path))
+            pnet.cuda()
+        else:
+            # forcing all GPU tensors to be in CPU while loading
+            #pnet.load_state_dict(torch.load(p_model_path, map_location=lambda storage, loc: storage))
+            pnet.load_state_dict(torch.load(p_model_path, map_location='cpu'))
+        pnet.eval()
+    if r_model_path is not None:
+        rnet = RNet(use_cuda=use_cuda)
+        if (use_cuda):
+            print('r_model_path:{0}'.format(r_model_path))
+            rnet.load_state_dict(torch.load(r_model_path))
+            rnet.cuda()
+        else:
+            rnet.load_state_dict(torch.load(r_model_path, map_location=lambda storage, loc: storage))
+        rnet.eval()
+    if o_model_path is not None:
+        onet = ONet(use_cuda=use_cuda)
+        if (use_cuda):
+            print('o_model_path:{0}'.format(o_model_path))
+            onet.load_state_dict(torch.load(o_model_path))
+            onet.cuda()
+        else:
+            onet.load_state_dict(torch.load(o_model_path, map_location=lambda storage, loc: storage))
+        onet.eval()
+    return pnet,rnet,onet
+class MtcnnDetector(object):
+    """
+        P,R,O net face detection and landmarks align
+    """
+    def  __init__(self,
+                 pnet = None,
+                 rnet = None,
+                 onet = None,
+                 min_face_size=12,
+                 stride=2,
+                 threshold=[0.6, 0.7, 0.7],
+                 #threshold=[0.1, 0.1, 0.1],
+                 scale_factor=0.709,
+                 ):
+        self.pnet_detector = pnet
+        self.rnet_detector = rnet
+        self.onet_detector = onet
+        self.min_face_size = min_face_size
+        self.stride=stride
+        self.thresh = threshold
+        self.scale_factor = scale_factor
+    def unique_image_format(self,im):
+        if not isinstance(im,np.ndarray):
+            if im.mode == 'I':
+                im = np.array(im, np.int32, copy=False)
+            elif im.mode == 'I;16':
+                im = np.array(im, np.int16, copy=False)
+            else:
+                im = np.asarray(im)
+        return im
+    def square_bbox(self, bbox):
+        """
+            convert bbox to square
+        Parameters:
+        ----------
+            bbox: numpy array , shape n x m
+                input bbox
+        Returns:
+        -------
+            a square bbox
+        """
+        square_bbox = bbox.copy()
+        # x2 - x1
+        # y2 - y1
+        h = bbox[:, 3] - bbox[:, 1] + 1
+        w = bbox[:, 2] - bbox[:, 0] + 1
+        l = np.maximum(h,w)
+        # x1 = x1 + w*0.5 - l*0.5
+        # y1 = y1 + h*0.5 - l*0.5
+        square_bbox[:, 0] = bbox[:, 0] + w*0.5 - l*0.5
+        square_bbox[:, 1] = bbox[:, 1] + h*0.5 - l*0.5
+        # x2 = x1 + l - 1
+        # y2 = y1 + l - 1
+        square_bbox[:, 2] = square_bbox[:, 0] + l - 1
+        square_bbox[:, 3] = square_bbox[:, 1] + l - 1
+        return square_bbox
+    def generate_bounding_box(self, map, reg, scale, threshold):
+        """
+            generate bbox from feature map
+        Parameters:
+        ----------
+            map: numpy array , n x m x 1
+                detect score for each position
+            reg: numpy array , n x m x 4
+                bbox
+            scale: float number
+                scale of this detection
+            threshold: float number
+                detect threshold
+        Returns:
+        -------
+            bbox array
+        """
+        stride = 2
+        cellsize = 12 # receptive field
+        t_index = np.where(map[:,:,0] > threshold)
+        # print('shape of t_index:{0}'.format(len(t_index)))
+        # print('t_index{0}'.format(t_index))
+        # time.sleep(5)
+        # find nothing
+        if t_index[0].size == 0:
+            return np.array([])
+        # reg = (1, n, m, 4)
+        # choose bounding box whose socre are larger than threshold
+        dx1, dy1, dx2, dy2 = [reg[0, t_index[0], t_index[1], i] for i in range(4)]
+        #print(dx1.shape)
+        #exit()
+        # time.sleep(5)
+        reg = np.array([dx1, dy1, dx2, dy2])
+        #print('shape of reg{0}'.format(reg.shape))
+        #exit()
+        # lefteye_dx, lefteye_dy, righteye_dx, righteye_dy, nose_dx, nose_dy, \
+        # leftmouth_dx, leftmouth_dy, rightmouth_dx, rightmouth_dy = [landmarks[0, t_index[0], t_index[1], i] for i in range(10)]
+        #
+        # landmarks = np.array([lefteye_dx, lefteye_dy, righteye_dx, righteye_dy, nose_dx, nose_dy, leftmouth_dx, leftmouth_dy, rightmouth_dx, rightmouth_dy])
+        # abtain score of classification which larger than threshold
+        # t_index[0]: choose the first column of t_index
+        # t_index[1]: choose the second column of t_index
+        score = map[t_index[0], t_index[1], 0]
+        # hence t_index[1] means column, t_index[1] is the value of x
+        # hence t_index[0] means row, t_index[0] is the value of y
+        boundingbox = np.vstack([np.round((stride * t_index[1]) / scale),            # x1 of prediction box in original image
+                                 np.round((stride * t_index[0]) / scale),            # y1 of prediction box in original image
+                                 np.round((stride * t_index[1] + cellsize) / scale), # x2 of prediction box in original image
+                                 np.round((stride * t_index[0] + cellsize) / scale), # y2 of prediction box in original image
+                                                                                     # reconstruct the box in original image
+                                 score,
+                                 reg,
+                                 # landmarks
+                                 ])
+        return boundingbox.T
+    def resize_image(self, img, scale):
+        """
+            resize image and transform dimention to [batchsize, channel, height, width]
+        Parameters:
+        ----------
+            img: numpy array , height x width x channel
+                input image, channels in BGR order here
+            scale: float number
+                scale factor of resize operation
+        Returns:
+        -------
+            transformed image tensor , 1 x channel x height x width
+        """
+        height, width, channels = img.shape
+        new_height = int(height * scale)     # resized new height
+        new_width = int(width * scale)       # resized new width
+        new_dim = (new_width, new_height)
+        img_resized = cv2.resize(img, new_dim, interpolation=cv2.INTER_LINEAR)      # resized image
+        return img_resized
+    def pad(self, bboxes, w, h):
+        """
+            pad the the boxes
+        Parameters:
+        ----------
+            bboxes: numpy array, n x 5
+                input bboxes
+            w: float number
+                width of the input image
+            h: float number
+                height of the input image
+        Returns :
+        ------
+            dy, dx : numpy array, n x 1
+                start point of the bbox in target image
+            edy, edx : numpy array, n x 1
+                end point of the bbox in target image
+            y, x : numpy array, n x 1
+                start point of the bbox in original image
+            ex, ex : numpy array, n x 1
+                end point of the bbox in original image
+            tmph, tmpw: numpy array, n x 1
+                height and width of the bbox
+        """
+        # width and height
+        tmpw = (bboxes[:, 2] - bboxes[:, 0] + 1).astype(np.int32)
+        tmph = (bboxes[:, 3] - bboxes[:, 1] + 1).astype(np.int32)
+        numbox = bboxes.shape[0]
+        dx = np.zeros((numbox, ))
+        dy = np.zeros((numbox, ))
+        edx, edy  = tmpw.copy()-1, tmph.copy()-1
+        # x, y: start point of the bbox in original image
+        # ex, ey: end point of the bbox in original image
+        x, y, ex, ey = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2], bboxes[:, 3]
+        tmp_index = np.where(ex > w-1)
+        edx[tmp_index] = tmpw[tmp_index] + w - 2 - ex[tmp_index]
+        ex[tmp_index] = w - 1
+        tmp_index = np.where(ey > h-1)
+        edy[tmp_index] = tmph[tmp_index] + h - 2 - ey[tmp_index]
+        ey[tmp_index] = h - 1
+        tmp_index = np.where(x < 0)
+        dx[tmp_index] = 0 - x[tmp_index]
+        x[tmp_index] = 0
+        tmp_index = np.where(y < 0)
+        dy[tmp_index] = 0 - y[tmp_index]
+        y[tmp_index] = 0
+        return_list = [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph]
+        return_list = [item.astype(np.int32) for item in return_list]
+        return return_list
+    def detect_pnet(self, im):
+        """Get face candidates through pnet
+        Parameters:
+        ----------
+        im: numpy array
+            input image array
+            one batch
+        Returns:
+        -------
+        boxes: numpy array
+            detected boxes before calibration
+        boxes_align: numpy array
+            boxes after calibration
+        """
+        # im = self.unique_image_format(im)
+        # original wider face data
+        h, w, c = im.shape
+        net_size = 12
+        current_scale = float(net_size) / self.min_face_size    # find initial scale
+        #print('imgshape:{0}, current_scale:{1}'.format(im.shape, current_scale))
+        im_resized = self.resize_image(im, current_scale) # scale = 1.0
+        current_height, current_width, _ = im_resized.shape
+        # fcn
+        all_boxes = list()
+        while min(current_height, current_width) > net_size:
+            #print('current:',current_height, current_width)
+            feed_imgs = []
+            image_tensor = image_tools.convert_image_to_tensor(im_resized)
+            feed_imgs.append(image_tensor)
+            feed_imgs = torch.stack(feed_imgs)
+            feed_imgs.requires_grad = True
+            if self.pnet_detector.use_cuda:
+                feed_imgs = feed_imgs.cuda()
+            # self.pnet_detector is a trained pnet torch model
+            # receptive field is 12×12
+            # 12×12 --> score
+            # 12×12 --> bounding box
+            cls_map, reg = self.pnet_detector(feed_imgs)
+            cls_map_np = image_tools.convert_chwTensor_to_hwcNumpy(cls_map.cpu())
+            reg_np = image_tools.convert_chwTensor_to_hwcNumpy(reg.cpu())
+            # print(cls_map_np.shape, reg_np.shape) # cls_map_np = (1, n, m, 1) reg_np.shape = (1, n, m 4)
+            # time.sleep(5)
+            # landmark_np = image_tools.convert_chwTensor_to_hwcNumpy(landmark.cpu())
+            # self.threshold[0] = 0.6
+            # print(cls_map_np[0,:,:].shape)
+            # time.sleep(4)
+            # boxes = [x1, y1, x2, y2, score, reg]
+            boxes = self.generate_bounding_box(cls_map_np[ 0, :, :], reg_np, current_scale, self.thresh[0])
+            #cv2.rectangle(im,(300,100),(400,200),color=(0,0,0))
+            #cv2.rectangle(im,(400,200),(500,300),color=(0,0,0))
+            # generate pyramid images
+            current_scale *= self.scale_factor # self.scale_factor = 0.709
+            im_resized = self.resize_image(im, current_scale)
+            current_height, current_width, _ = im_resized.shape
+            if boxes.size == 0:
+                continue
+            # non-maximum suppresion
+            keep = utils.nms(boxes[:, :5], 0.5, 'Union')
+            boxes = boxes[keep]
+            all_boxes.append(boxes)
+            """ img = im.copy()
+            bw = boxes[:,2]-boxes[:,0]
+            bh = boxes[:,3]-boxes[:,1]
+            for i in range(boxes.shape[0]):
+                p1=(int(boxes[i][0]+boxes[i][5]*bw[i]),int(boxes[i][1]+boxes[i][6]*bh[i]))
+                p2=(int(boxes[i][2]+boxes[i][7]*bw[i]),int(boxes[i][3]+boxes[i][8]*bh[i]))
+                cv2.rectangle(img,p1,p2,color=(0,0,0))
+            cv2.imshow('ss',img)
+            cv2.waitKey(0)
+            #ii+=1
+        exit() """
+        if len(all_boxes) == 0:
+            return None, None
+        all_boxes = np.vstack(all_boxes)
+        # print("shape of all boxes {0}".format(all_boxes.shape))
+        # time.sleep(5)
+        # merge the detection from first stage
+        keep = utils.nms(all_boxes[:, 0:5], 0.7, 'Union')
+        all_boxes = all_boxes[keep]
+        # boxes = all_boxes[:, :5]
+        # x2 - x1
+        # y2 - y1
+        bw = all_boxes[:, 2] - all_boxes[:, 0] + 1
+        bh = all_boxes[:, 3] - all_boxes[:, 1] + 1
+        # landmark_keep = all_boxes[:, 9:].reshape((5,2))
+        boxes = np.vstack([all_boxes[:,0],
+                   all_boxes[:,1],
+                   all_boxes[:,2],
+                   all_boxes[:,3],
+                   all_boxes[:,4],
+                   # all_boxes[:, 0] + all_boxes[:, 9] * bw,
+                   # all_boxes[:, 1] + all_boxes[:,10] * bh,
+                   # all_boxes[:, 0] + all_boxes[:, 11] * bw,
+                   # all_boxes[:, 1] + all_boxes[:, 12] * bh,
+                   # all_boxes[:, 0] + all_boxes[:, 13] * bw,
+                   # all_boxes[:, 1] + all_boxes[:, 14] * bh,
+                   # all_boxes[:, 0] + all_boxes[:, 15] * bw,
+                   # all_boxes[:, 1] + all_boxes[:, 16] * bh,
+                   # all_boxes[:, 0] + all_boxes[:, 17] * bw,
+                   # all_boxes[:, 1] + all_boxes[:, 18] * bh
+                  ])
+        boxes = boxes.T
+        # boxes = boxes = [x1, y1, x2, y2, score, reg] reg= [px1, py1, px2, py2] (in prediction)
+        align_topx = all_boxes[:, 0] + all_boxes[:, 5] * bw
+        align_topy = all_boxes[:, 1] + all_boxes[:, 6] * bh
+        align_bottomx = all_boxes[:, 2] + all_boxes[:, 7] * bw
+        align_bottomy = all_boxes[:, 3] + all_boxes[:, 8] * bh
+        # refine the boxes
+        boxes_align = np.vstack([ align_topx,
+                              align_topy,
+                              align_bottomx,
+                              align_bottomy,
+                              all_boxes[:, 4],
+                              # align_topx + all_boxes[:,9] * bw,
+                              # align_topy + all_boxes[:,10] * bh,
+                              # align_topx + all_boxes[:,11] * bw,
+                              # align_topy + all_boxes[:,12] * bh,
+                              # align_topx + all_boxes[:,13] * bw,
+                              # align_topy + all_boxes[:,14] * bh,
+                              # align_topx + all_boxes[:,15] * bw,
+                              # align_topy + all_boxes[:,16] * bh,
+                              # align_topx + all_boxes[:,17] * bw,
+                              # align_topy + all_boxes[:,18] * bh,
+                              ])
+        boxes_align = boxes_align.T
+        #remove invalid box
+        valindex = [True for _ in range(boxes_align.shape[0])]
+        for i in range(boxes_align.shape[0]):
+            if boxes_align[i][2]-boxes_align[i][0]<=3 or boxes_align[i][3]-boxes_align[i][1]<=3:
+                valindex[i]=False
+                #print('pnet has one smaller than 3')
+            else:
+                if boxes_align[i][2]<1 or boxes_align[i][0]>w-2 or boxes_align[i][3]<1 or boxes_align[i][1]>h-2:
+                    valindex[i]=False
+                    #print('pnet has one out')
+        boxes_align=boxes_align[valindex,:]
+        boxes = boxes[valindex,:]
+        return boxes, boxes_align
+    def detect_rnet(self, im, dets):
+        """Get face candidates using rnet
+        Parameters:
+        ----------
+        im: numpy array
+            input image array
+        dets: numpy array
+            detection results of pnet
+        Returns:
+        -------
+        boxes: numpy array
+            detected boxes before calibration
+        boxes_align: numpy array
+            boxes after calibration
+        """
+        # im: an input image
+        h, w, c = im.shape
+        if dets is None:
+            return None,None
+        if dets.shape[0]==0:
+            return None, None
+        # (705, 5) = [x1, y1, x2, y2, score, reg]
+        # print("pnet detection {0}".format(dets.shape))
+        # time.sleep(5)
+        detss = dets
+        # return square boxes
+        dets = self.square_bbox(dets)
+        detsss = dets
+        # rounds
+        dets[:, 0:4] = np.round(dets[:, 0:4])
+        [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] = self.pad(dets, w, h)
+        num_boxes = dets.shape[0]
+        '''
+        # helper for setting RNet batch size
+        batch_size = self.rnet_detector.batch_size
+        ratio = float(num_boxes) / batch_size
+        if ratio > 3 or ratio < 0.3:
+            print "You may need to reset RNet batch size if this info appears frequently, \
+        face candidates:%d, current batch_size:%d"%(num_boxes, batch_size)
+        '''
+        # cropped_ims_tensors = np.zeros((num_boxes, 3, 24, 24), dtype=np.float32)
+        cropped_ims_tensors = []
+        for i in range(num_boxes):
+            try:
+                tmp = np.zeros((tmph[i], tmpw[i], 3), dtype=np.uint8)
+                tmp[dy[i]:edy[i]+1, dx[i]:edx[i]+1, :] = im[y[i]:ey[i]+1, x[i]:ex[i]+1, :]
+            except:
+                print(dy[i],edy[i],dx[i],edx[i],y[i],ey[i],x[i],ex[i],tmpw[i],tmph[i])
+                print(dets[i])
+                print(detss[i])
+                print(detsss[i])
+                print(h,w)
+                exit()
+            crop_im = cv2.resize(tmp, (24, 24))
+            crop_im_tensor = image_tools.convert_image_to_tensor(crop_im)
+            # cropped_ims_tensors[i, :, :, :] = crop_im_tensor
+            cropped_ims_tensors.append(crop_im_tensor)
+        feed_imgs = torch.stack(cropped_ims_tensors)
+        feed_imgs.requires_grad = True
+        if self.rnet_detector.use_cuda:
+            feed_imgs = feed_imgs.cuda()
+        cls_map, reg = self.rnet_detector(feed_imgs)
+        cls_map = cls_map.cpu().data.numpy()
+        reg = reg.cpu().data.numpy()
+        # landmark = landmark.cpu().data.numpy()
+        keep_inds = np.where(cls_map > self.thresh[1])[0]
+        if len(keep_inds) > 0:
+            boxes = dets[keep_inds]
+            cls = cls_map[keep_inds]
+            reg = reg[keep_inds]
+            # landmark = landmark[keep_inds]
+        else:
+            return None, None
+        keep = utils.nms(boxes, 0.7)
+        if len(keep) == 0:
+            return None, None
+        keep_cls = cls[keep]
+        keep_boxes = boxes[keep]
+        keep_reg = reg[keep]
+        # keep_landmark = landmark[keep]
+        bw = keep_boxes[:, 2] - keep_boxes[:, 0] + 1
+        bh = keep_boxes[:, 3] - keep_boxes[:, 1] + 1
+        boxes = np.vstack([ keep_boxes[:,0],
+                              keep_boxes[:,1],
+                              keep_boxes[:,2],
+                              keep_boxes[:,3],
+                              keep_cls[:,0],
+                              # keep_boxes[:,0] + keep_landmark[:, 0] * bw,
+                              # keep_boxes[:,1] + keep_landmark[:, 1] * bh,
+                              # keep_boxes[:,0] + keep_landmark[:, 2] * bw,
+                              # keep_boxes[:,1] + keep_landmark[:, 3] * bh,
+                              # keep_boxes[:,0] + keep_landmark[:, 4] * bw,
+                              # keep_boxes[:,1] + keep_landmark[:, 5] * bh,
+                              # keep_boxes[:,0] + keep_landmark[:, 6] * bw,
+                              # keep_boxes[:,1] + keep_landmark[:, 7] * bh,
+                              # keep_boxes[:,0] + keep_landmark[:, 8] * bw,
+                              # keep_boxes[:,1] + keep_landmark[:, 9] * bh,
+                            ])
+        align_topx = keep_boxes[:,0] + keep_reg[:,0] * bw
+        align_topy = keep_boxes[:,1] + keep_reg[:,1] * bh
+        align_bottomx = keep_boxes[:,2] + keep_reg[:,2] * bw
+        align_bottomy = keep_boxes[:,3] + keep_reg[:,3] * bh
+        boxes_align = np.vstack([align_topx,
+                               align_topy,
+                               align_bottomx,
+                               align_bottomy,
+                               keep_cls[:, 0],
+                               # align_topx + keep_landmark[:, 0] * bw,
+                               # align_topy + keep_landmark[:, 1] * bh,
+                               # align_topx + keep_landmark[:, 2] * bw,
+                               # align_topy + keep_landmark[:, 3] * bh,
+                               # align_topx + keep_landmark[:, 4] * bw,
+                               # align_topy + keep_landmark[:, 5] * bh,
+                               # align_topx + keep_landmark[:, 6] * bw,
+                               # align_topy + keep_landmark[:, 7] * bh,
+                               # align_topx + keep_landmark[:, 8] * bw,
+                               # align_topy + keep_landmark[:, 9] * bh,
+                             ])
+        boxes = boxes.T
+        boxes_align = boxes_align.T
+        #remove invalid box
+        valindex = [True for _ in range(boxes_align.shape[0])]
+        for i in range(boxes_align.shape[0]):
+            if boxes_align[i][2]-boxes_align[i][0]<=3 or boxes_align[i][3]-boxes_align[i][1]<=3:
+                valindex[i]=False
+                print('rnet has one smaller than 3')
+            else:
+                if boxes_align[i][2]<1 or boxes_align[i][0]>w-2 or boxes_align[i][3]<1 or boxes_align[i][1]>h-2:
+                    valindex[i]=False
+                    print('rnet has one out')
+        boxes_align=boxes_align[valindex,:]
+        boxes = boxes[valindex,:]
+        """ img = im.copy()
+        for i in range(boxes_align.shape[0]):
+            p1=(int(boxes_align[i,0]),int(boxes_align[i,1]))
+            p2=(int(boxes_align[i,2]),int(boxes_align[i,3]))
+            cv2.rectangle(img,p1,p2,color=(0,0,0))
+        cv2.imshow('ss',img)
+        cv2.waitKey(0)
+        exit() """
+        return boxes, boxes_align
+    def detect_onet(self, im, dets):
+        """Get face candidates using onet
+        Parameters:
+        ----------
+        im: numpy array
+            input image array
+        dets: numpy array
+            detection results of rnet
+        Returns:
+        -------
+        boxes_align: numpy array
+            boxes after calibration
+        landmarks_align: numpy array
+            landmarks after calibration
+        """
+        h, w, c = im.shape
+        if dets is None:
+            return None, None
+        if dets.shape[0]==0:
+            return None, None
+        detss = dets
+        dets = self.square_bbox(dets)
+        dets[:, 0:4] = np.round(dets[:, 0:4])
+        [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] = self.pad(dets, w, h)
+        num_boxes = dets.shape[0]
+        # cropped_ims_tensors = np.zeros((num_boxes, 3, 24, 24), dtype=np.float32)
+        cropped_ims_tensors = []
+        for i in range(num_boxes):
+            try:
+                tmp = np.zeros((tmph[i], tmpw[i], 3), dtype=np.uint8)
+                # crop input image
+                tmp[dy[i]:edy[i] + 1, dx[i]:edx[i] + 1, :] = im[y[i]:ey[i] + 1, x[i]:ex[i] + 1, :]
+            except:
+                print(dy[i],edy[i],dx[i],edx[i],y[i],ey[i],x[i],ex[i],tmpw[i],tmph[i])
+                print(dets[i])
+                print(detss[i])
+                print(h,w)
+            crop_im = cv2.resize(tmp, (48, 48))
+            crop_im_tensor = image_tools.convert_image_to_tensor(crop_im)
+            # cropped_ims_tensors[i, :, :, :] = crop_im_tensor
+            cropped_ims_tensors.append(crop_im_tensor)
+        feed_imgs = torch.stack(cropped_ims_tensors)
+        feed_imgs.requires_grad = True
+        if self.rnet_detector.use_cuda:
+            feed_imgs = feed_imgs.cuda()
+        cls_map, reg, landmark = self.onet_detector(feed_imgs)
+        cls_map = cls_map.cpu().data.numpy()
+        reg = reg.cpu().data.numpy()
+        landmark = landmark.cpu().data.numpy()
+        keep_inds = np.where(cls_map > self.thresh[2])[0]
+        if len(keep_inds) > 0:
+            boxes = dets[keep_inds]
+            cls = cls_map[keep_inds]
+            reg = reg[keep_inds]
+            landmark = landmark[keep_inds]
+        else:
+            return None, None
+        keep = utils.nms(boxes, 0.7, mode="Minimum")
+        if len(keep) == 0:
+            return None, None
+        keep_cls = cls[keep]
+        keep_boxes = boxes[keep]
+        keep_reg = reg[keep]
+        keep_landmark = landmark[keep]
+        bw = keep_boxes[:, 2] - keep_boxes[:, 0] + 1
+        bh = keep_boxes[:, 3] - keep_boxes[:, 1] + 1
+        align_topx = keep_boxes[:, 0] + keep_reg[:, 0] * bw
+        align_topy = keep_boxes[:, 1] + keep_reg[:, 1] * bh
+        align_bottomx = keep_boxes[:, 2] + keep_reg[:, 2] * bw
+        align_bottomy = keep_boxes[:, 3] + keep_reg[:, 3] * bh
+        align_landmark_topx = keep_boxes[:, 0]
+        align_landmark_topy = keep_boxes[:, 1]
+        boxes_align = np.vstack([align_topx,
+                                 align_topy,
+                                 align_bottomx,
+                                 align_bottomy,
+                                 keep_cls[:, 0],
+                                 # align_topx + keep_landmark[:, 0] * bw,
+                                 # align_topy + keep_landmark[:, 1] * bh,
+                                 # align_topx + keep_landmark[:, 2] * bw,
+                                 # align_topy + keep_landmark[:, 3] * bh,
+                                 # align_topx + keep_landmark[:, 4] * bw,
+                                 # align_topy + keep_landmark[:, 5] * bh,
+                                 # align_topx + keep_landmark[:, 6] * bw,
+                                 # align_topy + keep_landmark[:, 7] * bh,
+                                 # align_topx + keep_landmark[:, 8] * bw,
+                                 # align_topy + keep_landmark[:, 9] * bh,
+                                 ])
+        boxes_align = boxes_align.T
+        landmark =  np.vstack([
+                                 align_landmark_topx + keep_landmark[:, 0] * bw,
+                                 align_landmark_topy + keep_landmark[:, 1] * bh,
+                                 align_landmark_topx + keep_landmark[:, 2] * bw,
+                                 align_landmark_topy + keep_landmark[:, 3] * bh,
+                                 align_landmark_topx + keep_landmark[:, 4] * bw,
+                                 align_landmark_topy + keep_landmark[:, 5] * bh,
+                                 align_landmark_topx + keep_landmark[:, 6] * bw,
+                                 align_landmark_topy + keep_landmark[:, 7] * bh,
+                                 align_landmark_topx + keep_landmark[:, 8] * bw,
+                                 align_landmark_topy + keep_landmark[:, 9] * bh,
+                                 ])
+        landmark_align = landmark.T
+        return boxes_align, landmark_align
+    def detect_face(self,img):
+        """Detect face over image
+        """
+        boxes_align = np.array([])
+        landmark_align =np.array([])
+        t = time.time()
+        # pnet
+        if self.pnet_detector:
+            p_boxes, boxes_align = self.detect_pnet(img)
+            if boxes_align is None:
+                return np.array([]), np.array([])
+            t1 = time.time() - t
+            t = time.time()
+        # rnet
+        if self.rnet_detector:
+            r_boxes, boxes_align = self.detect_rnet(img, boxes_align)
+            if boxes_align is None:
+                return np.array([]), np.array([])
+            t2 = time.time() - t
+            t = time.time()
+        # onet
+        if self.onet_detector:
+            boxes_align, landmark_align = self.detect_onet(img, boxes_align)
+            if boxes_align is None:
+                return np.array([]), np.array([])
+            t3 = time.time() - t
+            t = time.time()
+            print("time cost " + '{:.3f}'.format(t1+t2+t3) + '  pnet {:.3f}  rnet {:.3f}  onet {:.3f}'.format(t1, t2, t3))
+        return p_boxes,r_boxes,boxes_align, landmark_align

utils/models.py ADDED Viewed

	@@ -0,0 +1,207 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+'''模型使用老师提供的示例代码,仅修改了三处版本改动'''
+def weights_init(m):
+    if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
+        nn.init.xavier_uniform_(m.weight.data)
+        nn.init.constant_(m.bias, 0.1)
+class LossFn:
+    def __init__(self, cls_factor=1, box_factor=1, landmark_factor=1):
+        # loss function
+        self.cls_factor = cls_factor
+        self.box_factor = box_factor
+        self.land_factor = landmark_factor
+        self.loss_cls = nn.BCELoss() # binary cross entropy
+        self.loss_box = nn.MSELoss() # mean square error
+        self.loss_landmark = nn.MSELoss()
+    def cls_loss(self,gt_label,pred_label):
+        pred_label = torch.squeeze(pred_label)
+        gt_label = torch.squeeze(gt_label)
+        # get the mask element which >= 0, only 0 and 1 can effect the detection loss
+        mask = torch.ge(gt_label,0)
+        valid_gt_label = torch.masked_select(gt_label,mask)
+        valid_pred_label = torch.masked_select(pred_label,mask)
+        return self.loss_cls(valid_pred_label,valid_gt_label)*self.cls_factor
+    def box_loss(self,gt_label,gt_offset,pred_offset):
+        pred_offset = torch.squeeze(pred_offset)
+        gt_offset = torch.squeeze(gt_offset)
+        gt_label = torch.squeeze(gt_label)
+        #get the mask element which != 0
+        unmask = torch.eq(gt_label,0)
+        mask = torch.eq(unmask,0)
+        #convert mask to dim index
+        chose_index = torch.nonzero(mask.data)
+        chose_index = torch.squeeze(chose_index)
+        #only valid element can effect the loss
+        valid_gt_offset = gt_offset[chose_index,:]
+        valid_pred_offset = pred_offset[chose_index,:]
+        return self.loss_box(valid_pred_offset,valid_gt_offset)*self.box_factor
+    def landmark_loss(self,gt_label,gt_landmark,pred_landmark):
+        pred_landmark = torch.squeeze(pred_landmark)
+        gt_landmark = torch.squeeze(gt_landmark)
+        gt_label = torch.squeeze(gt_label)
+        mask = torch.eq(gt_label,-2)
+        chose_index = torch.nonzero(mask.data)
+        chose_index = torch.squeeze(chose_index)
+        valid_gt_landmark = gt_landmark[chose_index, :]
+        valid_pred_landmark = pred_landmark[chose_index, :]
+        return self.loss_landmark(valid_pred_landmark,valid_gt_landmark)*self.land_factor
+class PNet(nn.Module):
+    ''' PNet '''
+    def __init__(self, is_train=False, use_cuda=True):
+        super(PNet, self).__init__()
+        self.is_train = is_train
+        self.use_cuda = use_cuda
+        # backend
+        self.pre_layer = nn.Sequential(
+            nn.Conv2d(3, 10, kernel_size=3, stride=1),  # conv1
+            nn.PReLU(),  # PReLU1
+            nn.MaxPool2d(kernel_size=2, stride=2),  # pool1
+            nn.Conv2d(10, 16, kernel_size=3, stride=1),  # conv2
+            nn.PReLU(),  # PReLU2
+            nn.Conv2d(16, 32, kernel_size=3, stride=1),  # conv3
+            nn.PReLU()  # PReLU3
+        )
+        # detection
+        self.conv4_1 = nn.Conv2d(32, 1, kernel_size=1, stride=1)
+        # bounding box regresion
+        self.conv4_2 = nn.Conv2d(32, 4, kernel_size=1, stride=1)
+        # landmark localization
+        self.conv4_3 = nn.Conv2d(32, 10, kernel_size=1, stride=1)
+        # weight initiation with xavier
+        self.apply(weights_init)
+    def forward(self, x):
+        x = self.pre_layer(x)
+        label = torch.sigmoid(self.conv4_1(x))
+        offset = self.conv4_2(x)
+        # landmark = self.conv4_3(x)
+        if self.is_train is True:
+            # label_loss = LossUtil.label_loss(self.gt_label,torch.squeeze(label))
+            # bbox_loss = LossUtil.bbox_loss(self.gt_bbox,torch.squeeze(offset))
+            return label,offset
+        #landmark = self.conv4_3(x)
+        return label, offset
+class RNet(nn.Module):
+    ''' RNet '''
+    def __init__(self,is_train=False, use_cuda=True):
+        super(RNet, self).__init__()
+        self.is_train = is_train
+        self.use_cuda = use_cuda
+        # backend
+        self.pre_layer = nn.Sequential(
+            nn.Conv2d(3, 28, kernel_size=3, stride=1),  # conv1
+            nn.PReLU(),  # prelu1
+            nn.MaxPool2d(kernel_size=3, stride=2),  # pool1
+            nn.Conv2d(28, 48, kernel_size=3, stride=1),  # conv2
+            nn.PReLU(),  # prelu2
+            nn.MaxPool2d(kernel_size=3, stride=2),  # pool2
+            nn.Conv2d(48, 64, kernel_size=2, stride=1),  # conv3
+            nn.PReLU()  # prelu3
+        )
+        self.conv4 = nn.Linear(64*2*2, 128)  # conv4
+        self.prelu4 = nn.PReLU()  # prelu4
+        # detection
+        self.conv5_1 = nn.Linear(128, 1)
+        # bounding box regression
+        self.conv5_2 = nn.Linear(128, 4)
+        # lanbmark localization
+        self.conv5_3 = nn.Linear(128, 10)
+        # weight initiation weih xavier
+        self.apply(weights_init)
+    def forward(self, x):
+        # backend
+        x = self.pre_layer(x)
+        x = x.view(x.size(0), -1)
+        x = self.conv4(x)
+        x = self.prelu4(x)
+        # detection
+        det = torch.sigmoid(self.conv5_1(x))
+        box = self.conv5_2(x)
+        # landmark = self.conv5_3(x)
+        if self.is_train is True:
+            return det, box
+        #landmard = self.conv5_3(x)
+        return det, box
+class ONet(nn.Module):
+    ''' RNet '''
+    def __init__(self,is_train=False, use_cuda=True):
+        super(ONet, self).__init__()
+        self.is_train = is_train
+        self.use_cuda = use_cuda
+        # backend
+        self.pre_layer = nn.Sequential(
+            nn.Conv2d(3, 32, kernel_size=3, stride=1),  # conv1
+            nn.PReLU(),  # prelu1
+            nn.MaxPool2d(kernel_size=3, stride=2),  # pool1
+            nn.Conv2d(32, 64, kernel_size=3, stride=1),  # conv2
+            nn.PReLU(),  # prelu2
+            nn.MaxPool2d(kernel_size=3, stride=2),  # pool2
+            nn.Conv2d(64, 64, kernel_size=3, stride=1),  # conv3
+            nn.PReLU(), # prelu3
+            nn.MaxPool2d(kernel_size=2,stride=2), # pool3
+            nn.Conv2d(64,128,kernel_size=2,stride=1), # conv4
+            nn.PReLU() # prelu4
+        )
+        self.conv5 = nn.Linear(128*2*2, 256)  # conv5
+        self.prelu5 = nn.PReLU()  # prelu5
+        # detection
+        self.conv6_1 = nn.Linear(256, 1)
+        # bounding box regression
+        self.conv6_2 = nn.Linear(256, 4)
+        # lanbmark localization
+        self.conv6_3 = nn.Linear(256, 10)
+        # weight initiation weih xavier
+        self.apply(weights_init)
+    def forward(self, x):
+        # backend
+        x = self.pre_layer(x)
+        x = x.view(x.size(0), -1)
+        x = self.conv5(x)
+        x = self.prelu5(x)
+        # detection
+        det = torch.sigmoid(self.conv6_1(x))
+        box = self.conv6_2(x)
+        landmark = self.conv6_3(x)
+        if self.is_train is True:
+            return det, box, landmark
+        #landmard = self.conv5_3(x)
+        return det, box, landmark

utils/tool.py ADDED Viewed

	@@ -0,0 +1,117 @@

+import numpy as np
+import time
+def IoU(box, boxes):
+    """Compute IoU between detect box and gt boxes
+    Parameters:
+    ----------
+    box: numpy array , shape (5, ): x1, y1, x2, y2, score
+        input box
+    boxes: numpy array, shape (n, 4): x1, y1, x2, y2
+        input ground truth boxes
+    Returns:
+    -------
+    ovr: numpy.array, shape (n, )
+        IoU
+    """
+    box_area = (box[2] - box[0] + 1) * (box[3] - box[1] + 1)
+    area = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1)
+    xx1 = np.maximum(box[0], boxes[:, 0])
+    yy1 = np.maximum(box[1], boxes[:, 1])
+    xx2 = np.minimum(box[2], boxes[:, 2])
+    yy2 = np.minimum(box[3], boxes[:, 3])
+    # compute the width and height of the bounding box
+    w = np.maximum(0, xx2 - xx1 + 1)
+    h = np.maximum(0, yy2 - yy1 + 1)
+    inter = w * h
+    ovr = np.true_divide(inter,(box_area + area - inter))
+    #ovr = inter / (box_area + area - inter)
+    return ovr
+def convert_to_square(bbox):
+    """Convert bbox to square
+    Parameters:
+    ----------
+    bbox: numpy array , shape n x 5
+        input bbox
+    Returns:
+    -------
+    square bbox
+    """
+    square_bbox = bbox.copy()
+    h = bbox[:, 3] - bbox[:, 1] + 1
+    w = bbox[:, 2] - bbox[:, 0] + 1
+    max_side = np.maximum(h,w)
+    square_bbox[:, 0] = bbox[:, 0] + w*0.5 - max_side*0.5
+    square_bbox[:, 1] = bbox[:, 1] + h*0.5 - max_side*0.5
+    square_bbox[:, 2] = square_bbox[:, 0] + max_side - 1
+    square_bbox[:, 3] = square_bbox[:, 1] + max_side - 1
+    return square_bbox
+# non-maximum suppression: eleminates the box which have large interception with the box which have the largest score
+def nms(dets, thresh, mode="Union"):
+    """
+    greedily select boxes with high confidence
+    keep boxes overlap <= thresh
+    rule out overlap > thresh
+    :param dets: [[x1, y1, x2, y2 score]]
+    :param thresh: retain overlap <= thresh
+    :return: indexes to keep
+    """
+    x1 = dets[:, 0]
+    y1 = dets[:, 1]
+    x2 = dets[:, 2]
+    y2 = dets[:, 3]
+    scores = dets[:, 4]
+    # shape of x1 = (454,), shape of scores = (454,)
+    # print("shape of x1 = {0}, shape of scores = {1}".format(x1.shape, scores.shape))
+    # time.sleep(5)
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1] # argsort: ascending order then [::-1] reverse the order --> descending order
+    # print("shape of order {0}".format(order.size)) # (454,)
+    # time.sleep(5)
+    # eleminates the box which have large interception with the box which have the largest score in order
+    # matain the box with largest score and boxes don't have large interception with it
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        # cacaulate the IOU between box which have largest score with other boxes
+        if mode == "Union":
+            # area[i]: the area of largest score
+            ovr = inter / (areas[i] + areas[order[1:]] - inter)
+        elif mode == "Minimum":
+            ovr = inter / np.minimum(areas[i], areas[order[1:]])
+        inds = np.where(ovr <= thresh)[0]
+        order = order[inds + 1] # +1: eliminates the first element in order
+        # print(inds)
+        # print("shape of order {0}".format(order.shape))  # (454,)
+        # time.sleep(2)
+    return keep

utils/vision.py ADDED Viewed

	@@ -0,0 +1,58 @@

+from matplotlib.patches import Circle
+import os
+import sys
+import matplotlib.pyplot as plt
+import pylab
+sys.path.append(os.getcwd())
+def vis_face(im_array, dets, landmarks, face_size, save_name):
+    """Visualize detection results
+    Parameters:
+    ----------
+    im_array: numpy.ndarray, shape(1, c, h, w)
+        test image in rgb
+    dets1: numpy.ndarray([[x1 y1 x2 y2 score]])
+        detection results before calibration
+    dets2: numpy.ndarray([[x1 y1 x2 y2 score]])
+        detection results after calibration
+    thresh: float
+        boxes with scores > thresh will be drawn in red otherwise yellow
+    Returns:
+    -------
+    """
+    pylab.imshow(im_array)
+    for i in range(dets.shape[0]):
+        bbox = dets[i, :5]
+        rect = pylab.Rectangle((bbox[0], bbox[1]),
+                             bbox[2] - bbox[0],
+                             bbox[3] - bbox[1], fill=False,
+                             edgecolor='red', linewidth=0.9)
+        score = bbox[4]
+        plt.gca().text(bbox[0], bbox[1] - 2,
+                       '{:.5f}'.format(score),
+                       bbox=dict(facecolor='red', alpha=0.5), fontsize=8, color='white')
+        pylab.gca().add_patch(rect)
+    if landmarks is not None:
+        for i in range(landmarks.shape[0]):
+            landmarks_one = landmarks[i, :]
+            landmarks_one = landmarks_one.reshape((5, 2))
+            for j in range(5):
+                cir1 = Circle(xy=(landmarks_one[j, 0], landmarks_one[j, 1]), radius=face_size/12, alpha=0.4, color="red")
+                pylab.gca().add_patch(cir1)
+        #pylab.savefig(save_name)
+        #只保存图片内容，不保存坐标轴
+        pylab.axis('off')
+        pylab.savefig(save_name, bbox_inches='tight', pad_inches=0.0)
+        pylab.show()
+        # 返回图片对象
+        return pylab