Github 项目 - tf-cpn
论文 - Cascaded Pyramid Network for Multi-Person Pose Estimation

CPN (Cascaded Pyramid Network) 是 COCO 2017 Keypoints 竞赛冠军方法,这里是基于 Tensorflow 的实现的多人姿态识别.
原始实现基于 Face++ (Megvii Inc) 内部深度学习框架(MegBrain) 的实现.

论文阅读 - Cascaded Pyramid Network for Multi-Person Pose Estimation - AIUAI

1. 实现结果

1.1. COCO minival dataset(Single Model)

测试代码需要基于人体检测器.

对于 COCO minival dataset,所采用的人体检测器的精度为 AP=41.1,其中关于人体的精度为 AP=55.3.

MethodBase ModelInput SizeAP @0.5:0.95AP @0.5AP @0.75AP mediumAP large
CPNResNet-50256x19269.788.377.066.276.1
CPNResNet-50384x28872.389.178.868.479.1
CPNResNet-101384x28872.989.279.469.179.9

1.2. COCO test-dev dataset (Single Model)

采用强检测器,对于 COCO test-dev dataset,其精度为 AP=44.5,其中关于人体的精度为 AP=57.2.

MethodAP @0.5:0.95AP @0.5AP @0.75AP mediumAP large
Detectron(Mask R-CNN)67.088.073.162.275.6
CPN(ResNet-101, 384x288)72.090.479.568.378.6

为了进行参考对比,采用 MegDet 的检测器,其精度为 AP=52.1,其中关于人体的检测精度为 AP=62.9,得到的姿态估计结果为:

MethodAP @0.5:0.95AP @0.5AP @0.75AP mediumAP large
MegDet+CPN(ResNet-101, 384x288)73.091.880.869.178.7

MegDet: A Large Mini-Batch Object Detector

2. 实现过程

2.1 MSCOCO 数据集上模型训练

[1] - 克隆项目

git clone https://github.com/chenyilun95/tf-cpn.git

假设本地项目路径为 $CPN_ROOT.

[2] - MSCOCO 图片数据 - http://cocodataset.org/#download. 在 COCO trainvalminusminival 数据集(googledrive) 上训练模型,并在 COCO minival 数据集(googledrive) 上验证模型.
下载的数据集和 Python API 放在路径 $CPN_ROOT/data/COCO/MSCOCO 中.
config.py 定义了所有路径,可以自定义设置.

[3] - 下载 base 模型(ResNet) 权重文件 - tf slim model_zoo,放于路径 $CPN_ROOT/data/imagenet_weights/.

[4] - 设置环境

pip3 install -r requirement.txt
cd $CPN_ROOT/lib
make clean; make all
cd $CPN_ROOT/lib/lib_kernal/lib_nms
./compile.sh

避免出现错误:from lib_kernel.lib_nms.gpu_nms import gpu_nms ImportError: libcudart.so.8.0: cannot open shared object.

[5] - 训练 CPN 模型,采用模型文件夹中的 network.py.

python3 network.py -d 0-1

模型训练后,输出路径 $CPN_ROOT/log/ 中包含的文件类似如下:

log/
|-model_dump/
|    |snapshot_1.ckpt.data-00000-of-00001
|    |snapshot_1.ckpt.index
|    |snapshot_1.ckpt.meta
|    |...
|train_logs.txt

2.2 COCO 数据集上模型验证

运行测试代码:

python3 mptest.py -d 0-1 -r 350

其中,这里假设存在训练 350 epochs 的模型文件.

如果需要指定预训练的模型路径,则可以运行:

python3 mptest.py -d 0-1 -m log/model_dump/snapshot_350.ckpt

提供的测试模型(googledrive):

人体 boxes 检测模型:

CPN 预训练模型:

3. 图片测试 Demo

测试文件路径内所有图片.
每张图片只简单进行 resize 到网络输入尺寸,未做翻转.

    import os
    import numpy as np
    import argparse
    from config import cfg
    import cv2
    import sys
    import matplotlib.pyplot as plt

    import tensorflow as tf

    from tfflat.base import Tester
    from tfflat.utils import mem_info
    from network import Network

    def analyse(tester, imagefile):
        test_img = cv2.imread(imagefile)
        # test_img = cv2.resize(test_img, (288, 384), interpolation=cv2.INTER_LINEAR)
        height, width, _ = test_img.shape
        scale_height = 384/height
        scale_width = 288/width
        scale_img = cv2.resize(test_img, (0, 0), fx=scale_width, fy=scale_height, interpolation=cv2.INTER_LANCZOS4)

        mean_img = scale_img - cfg.pixel_means
        mean_img = mean_img / 255.
        mean_img = mean_img.transpose(2, 0, 1)
        mean_img = np.asarray(mean_img).astype(np.float32)
        feed = np.zeros((1, mean_img.shape[0], mean_img.shape[1], mean_img.shape[2]))
        feed[0] = mean_img

        res = tester.predict_one([feed.transpose(0, 2, 3, 1).astype(np.float32)])[0]
        res = res.transpose(0, 3, 1, 2)[0]

        cls_skeleton = np.zeros((cfg.nr_skeleton, 3))
        res /= 255.
        res += 0.5
        for w in range(cfg.nr_skeleton):
            res[w] /= np.amax(res[w])
        border = 10
        dr = np.zeros((cfg.nr_skeleton, cfg.output_shape[0] + 2 * border, cfg.output_shape[1] + 2 * border))
        dr[:, border:-border, border:-border] = res[:cfg.nr_skeleton].copy()
        for w in range(cfg.nr_skeleton):
            dr[w] = cv2.GaussianBlur(dr[w], (21, 21), 0)
        for w in range(cfg.nr_skeleton):
            lb = dr[w].argmax()
            y, x = np.unravel_index(lb, dr[w].shape)
            dr[w, y, x] = 0
            lb = dr[w].argmax()
            py, px = np.unravel_index(lb, dr[w].shape)
            y -= border
            x -= border
            py -= border + y
            px -= border + x
            ln = (px ** 2 + py ** 2) ** 0.5
            delta = 0.25
            if ln > 1e-3:
                x += delta * px / ln
                y += delta * py / ln
            x = max(0, min(x, cfg.output_shape[1] - 1))
            y = max(0, min(y, cfg.output_shape[0] - 1))
            cls_skeleton[w, :2] = (x * 4 + 2, y * 4 + 2)
            cls_skeleton[w, 2] = res[w, int(round(y) + 1e-10), int(round(x) + 1e-10)]

        # map back to original images
        plt.imshow(test_img[:,:,::-1])
        for idx in range(cfg.nr_skeleton):
            plt.scatter(cls_skeleton[idx][0] / scale_width, cls_skeleton[idx][1]/scale_height, marker='p', color='r', s=10)
        plt.show()

    if __name__ == '__main__':

        gpu_ids = str(np.argmin(mem_info()))
        test_model = 'tf-cpn/models/COCO.res50.384x288.CPN/snapshot_350.ckpt'

        cfg.set_args(gpu_ids.split(',')[0])
        tester = Tester(Network(), cfg)
        tester.load_weights(test_model)

        images_list = os.listdir('/path/to/test_images')
        for image_file in images_list:
            image_file = os.path.join('/path/to/test_images', image_file)
            analyse(tester, image_file)

        print('Done.')

PyTorch CPN 实现

Last modification:April 17th, 2019 at 10:16 am