FAIR 继开源了基于Caffe2 的 Detectron 及基于 PyTorch 的 maskrcnn-benchmark 后,又推出了新的基于最新 PyTorch1.3 的目标检测算法的实现.

Github - detectron2

detectron2 主要特点:

[1] - 基于 PyTorch 深度学习框架.

[2] - 包含更多特性,如全景分割(panoptic segmentation)、densepose、Cascade R-CNN、旋转边界框(rotated bounding boxes) 等等.

[3] - 可以用于作为支持不同项目的库(detectron2 - projects),未来还会开源更多的研究项目.

[4] - 训练速度更快(detectron2 - Benchmarks).

1. detectron2 安装

detectron2 安装说明(需科学上网):

Install detectron2 - Colab Notebook.

也可参考 detectron2 - Dockerfile.

依赖项:

  • Python >= 3.6
  • PyTorch >=1.4 与对应版本的 torchvision
  • OpenCV,用于 demo 及可视化
  • fvcore
  • pycocotools>=2.0.1
  • GCC & G++ >= 5

依赖项安装:

sudo apt-get update
sudo apt-get install build-essential python3-dev 
sudo apt-get install libpng-dev libjpeg-dev python3-opencv 
sudo apt-get install ca-certificates pkg-config 
sudo apt-get install git curl wget automake libtool

#pip
curl -fSsL -O https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py && rm get-pip.py

#opencv
sudo pip install opencv-python
sudo pip install cloudpickle
sudo pip install matplotlib
sudo pip install tabulate
sudo pip install tensorboard

#torch, torchvision
sudo pip install torch torchvision

#fvcore
sudo pip install 'git+https://github.com/facebookresearch/fvcore'

#pycocotools
sudo pip install cython
sudo pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
sudo pip install pycocotools

1.1. 源码安装

detectron2 安装(Old,GCC & G++ >= 4.9):

git clone git@github.com:facebookresearch/detectron2.git
cd detectron2
python setup.py build develop

detectron2 安装(New, GCC & G++ >= 5):

推荐采用 ninja 以快速 build.

# 方式一
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
#权限不足时
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' --user

# 方式二:
git clone https://github.com/facebookresearch/detectron2.git
cd detectron2 
python -m pip install -e .

# MacOS
#CC=clang CXX=clang++ python -m pip install ......

重新编译,只需删除 rm -f build/ **/*.so ,再重新执行编译. 注:如果 PyTorch 重新安装了,detectron2 往往也需要重新编译.

1.2. 预编译包安装(仅支持 Linux)

如下表,根据对应的环境选择相应的包安装.

CUDAtorch1.5torch1.4
10.2python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.5/index.html
10.1python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.htmlpython -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.4/index.html
10.0 python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html
9.2python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.5/index.htmlpython -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.4/index.html
cpupython -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.5/index.htmlpython -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.4/index.html

2. detectron2 简单使用

2.1. 目标检测

cd $detectron2_root/
python3 demo/demo.py  \
    --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml \
    --input input.jpg --output outputs/ \
    --opts MODEL.WEIGHTS detectron2://COCO-Detection/faster_rcnn_R_50_FPN_1x/137257794/model_final_b275ba.pkl

2.2. 实例分割1

cd $detectron2_root/
python3 demo/demo.py  \
    --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
    --input input.jpg --output outputs/ \
    --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

2.3. 实例分割2

示例:

import numpy as np
import cv2
from matplotlib import pyplot

import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog

# 下载测试图片:
# wget http://images.cocodataset.org/val2017/000000439715.jpg -O input.jpg
im = cv2.imread("./input.jpg")
plt.figure()
plt.imshow(im[:, :, ::-1])
plt.show()

#
cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  #模型阈值
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"
predictor = DefaultPredictor(cfg)
outputs = predictor(im)

#
pred_classes = outputs["instances"].pred_classes
pred_boxes = outputs["instances"].pred_boxes

#在原图上画出检测结果
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
plt.figure(2)
plt.imshow(v.get_image()[:, :, ::-1])
plt.show()

如:

2.4. 关键点检测

cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7
cfg.MODEL.WEIGHTS = "detectron2://COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x/137849621/model_final_a6e10b.pkl"
predictor = DefaultPredictor(cfg)
outputs = predictor(im)
v = Visualizer(im[:,:,::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
plt.imshow(v.get_image()[:, :, ::-1])
plt.show()

输出如:

2.5. 全景分割

panoptic segmentation

cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml")
cfg.MODEL.WEIGHTS = "detectron2://COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/model_final_cafdb1.pkl"
predictor = DefaultPredictor(cfg)
panoptic_seg, segments_info = predictor(im)["panoptic_seg"]
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
v = v.draw_panoptic_seg_predictions(panoptic_seg.to("cpu"), segments_info)
plt.imshow(v.get_image()[:, :, ::-1])
plt.show()

输出如:

2.6. 视频全景分割

cd $detectron2_root/
python demo/demo.py \
    --config-file configs/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml \
    --video-input ../video-clip.mp4 \
    --confidence-threshold 0.6 \
    --output ../video-output.mkv \
    --opts MODEL.WEIGHTS detectron2://COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/model_final_cafdb1.pkl

3. detectron2 Model_ZOO

detectron2 - MODEL_ZOO

detectron2 提供了许多在 2019.9月到10月期间训练的 baselines 模型. 其对应的配置文件位于 detectron2_root/configs 路径.

通用设置:

[1] - 训练平台:8 NVIDIA V100 GPUs 的 Big Basin 服务器,采用数据并行(data-parallel) sync SGD 训练,minibatch 为 16 张图片.

[2] - 训练环境:CUDA 9.2, cuDNN 7.4.2 or 7.6.3(cuDNN 版本影响可以忽略).

[3] - 训练曲线:每个模型的训练曲线及其它统计信息位于每个模型的 metrics 路径.

[4] - 默认设置并没有与 Detectron 直接比较. 例如,默认训练数据增强会在水平翻转之外,还采用了尺度抖动(scale jittering). j精度比较 - Detectron1-Comparisons;速度比较 - benchmarks.

[5] - 推断速度估计:tools/train_net.py --eval-only,其中 batchsize=1. 在实际部署中往往会比给定的推断速度更快,因为进行了更多的优化.

[6] - 训练速度对整个训练过程进行了求平均.

[7] - 对于 Faster/Mask R-CNN,提供了三种不同的 backbone 组合:

  • FPN - 采用标准 conv 的 ResNet+FPN backbone,再接 FC heads 分别进行 mask 和 box 预测. 其得到了最佳的速度与精度的平衡. 但其它两种仍有研究价值.
  • C4 - 采用 ResNet conv4 backbone 和 conv5 head. 其是在 Faster R-CNN论文中采用的 baseline.
  • DC5(Dilated-C5) - 采用 ResNet conv5 backbone,其中 conv5 采用了 dilations,其后接标准 conv 和 FC heads 用于 mask 和 box 预测. 这是在 Deformable ConvNet 论文中所采用的.

[8] - 大部分模型时采用 3x schedule 进行训练的(大约 37 个 COCO epochs). 尽管 1x 模型是严重训练不足的,但仍然提供了一些 1x schedule (大约 12 个 COCO epochs) 的 ResNet-50 模型,以用于研究对比.

3.1. ImageNet 预训练模型

在 ImageNet-1k 数据集上得到的 backbone 模型. 其与 Detectron 中所提供的是不同的 - 没有将 BatchNorm 整合进仿射层(do not fuse BatchNorm into an affine layer).

[1] - R-50.pkl: converted copy of MSRA's original ResNet-50 model

[2] - R-101.pkl: converted copy of MSRA's original ResNet-101 model

[3] - X-101-32x8d.pkl: ResNeXt-101-32x8d model trained with Caffe2 at FB

Detectron 中的预训练模型仍被使用的是,如:

[1] - X-152-32x8d-IN5k.pkl: ResNeXt-152-32x8d model trained on ImageNet-5k with Caffe2 at FB (see ResNeXt paper for details on ImageNet-5k).

[2] - R-50-GN.pkl: ResNet-50 with Group Normalization.

[3] - R-101-GN.pkl: ResNet-101 with Group Normalization.

3.2. COCO 目标检测 Baselines

3.2.1. Faster R-NN

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APmodel iddownload
R50-C41x0.5510.1104.835.7137257644model & metrics
R50-DC51x0.3800.0685.037.3137847829model & metrics
R50-FPN1x0.2100.0553.037.9137257794model & metrics
R50-C43x0.5430.1104.838.4137849393model & metrics
R50-DC53x0.3780.0735.039.0137849425model & metrics
R50-FPN3x0.2090.0473.040.2137849458model & metrics
R101-C43x0.6190.1495.941.1138204752model & metrics
R101-DC53x0.4520.0826.140.6138204841model & metrics
R101-FPN3x0.2860.0634.142.0137851257model & metrics
X101-FPN3x0.6380.1206.743.0139173657model & metrics

3.2.2. RetinaNet

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APmodel iddownload
R501x0.2000.0623.936.5137593951model & metrics
R503x0.2010.0633.937.9137849486model & metrics
R1013x0.2800.0805.139.9138363263model & metrics

3.2.3. RPN & Fast R-CNN

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APprop. ARmodel iddownload
RPN R50-C41x0.1300.0511.5 51.6137258005model & metrics
RPN R50-FPN1x0.1860.0452.7 58.0137258492model & metrics
Fast R-CNN R50-FPN1x0.1400.0352.637.8 137635226model & metrics

3.3. COCO 实例分割 Baselines

Mask R-CNN:

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APmask APmodel iddownload
R50-C41x0.5840.1175.236.832.2137259246model & metrics
R50-DC51x0.4710.0746.538.334.2137260150model & metrics
R50-FPN1x0.2610.0533.438.635.2137260431model & metrics
R50-C43x0.5750.1185.239.834.4137849525model & metrics
R50-DC53x0.4700.0756.540.035.9137849551model & metrics
R50-FPN3x0.2610.0553.441.037.2137849600model & metrics
R101-C43x0.6520.1556.342.636.7138363239model & metrics
R101-DC53x0.5450.1557.641.937.3138363294model & metrics
R101-FPN3x0.3400.0704.642.938.6138205316model & metrics
X101-FPN3x0.6900.1297.244.339.5139653917model & metrics

3.4. COCO 人体关键点检测 Baselines

Keypoint R-CNN:

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APkp. APmodel iddownload
R50-FPN1x0.3150.0835.053.664.0137261548model & metrics
R50-FPN3x0.3160.0765.055.465.5137849621model & metrics
R101-FPN3x0.3900.0906.156.466.1138363331model & metrics
X101-FPN3x0.7380.1428.757.366.0139686956model & metrics

3.5. COCO 全景分割 Baselines

Panoptic FPN:

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APmask APPQmodel iddownload
R50-FPN1x0.3040.0634.837.634.739.4139514544model & metrics
R50-FPN3x0.3020.0634.840.036.541.5139514569model & metrics
R101-FPN3x0.3920.0786.042.438.543.0139514519model & metrics

3.6. LVIS 实例分割 Baselines

LVIS 数据集

Mask R-CNN:(大概训练在 LVISv0.5 数据集训练了 24 个 epochs).

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APmask APmodel iddownload
R50-FPN1x0.2920.1277.123.624.4144219072model & metrics
R101-FPN1x0.3710.1247.825.625.9144219035model & metrics
X101-FPN1x0.7120.16610.226.727.1144219108model & metrics

3.7. Cityscapes & Pascal VOC Baselines

Nametrain time (s/iter)inference time (s/im)train mem (GB)box APbox AP50mask APmodel iddownload
R50-FPN, Cityscapes0.2400.0924.4 36.5142423278model & metrics
R50-C4, VOC0.5370.0864.851.980.3 142202221model & metrics

3.8. Deformable Conv 和 Cascade R-CNN

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APmask APmodel iddownload
Baseline R50-FPN1x0.2610.0533.438.635.2137260431model & metrics
Deformable Conv1x0.3420.0613.541.537.5138602867model & metrics
Cascade R-CNN1x0.3170.0664.042.136.4138602847model & metrics
Baseline R50-FPN3x0.2610.0553.441.037.2137849600model & metrics
Deformable Conv3x0.3490.0663.542.738.5144998336model & metrics
Cascade R-CNN3x0.3280.0754.044.338.5144998488model & metrics

3.9. 其它设置

不同 normalization 方法:

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APmask APmodel iddownload
Baseline R50-FPN3x0.2610.0553.441.037.2137849600model & metrics
SyncBN3x0.4640.0635.642.037.8143915318model & metrics
GN3x0.3560.0777.342.638.6138602888model & metrics
GN (scratch)3x0.4000.0779.839.936.6138602908model & metrics

少量训练了非常久得到的非常大的模型,仅用于 demo

Nameinference time (s/im)train mem (GB)box APmask APPQmodel iddownload
Panoptic FPN R1010.12311.447.441.346.1139797668model & metrics
Mask R-CNN X1520.28115.149.343.2 18131413model & metrics
above + test-time aug. 51.445.5

4. 模型训练

以默认的 COCO 数据集为例:

8 GPU:

python tools/train_net.py --num-gpus 8 \
    --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml

1 GPU:

python tools/train_net.py \
    --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
    SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025

不支持 CPU 训练.

Batchsize 设置遵循: linear learning rate scaling rule.

模型评估:

train_net.py --eval-only

详细参数参考:

python tools/train_net.py -h

5. 相关材料

[1] - Caffe2 - Detectron 安装

[2] - Caffe2 - Detectron 简单使用

[3] - Caffe2 - Detectron 图片测试结果

[4] - Caffe2 - Detectron 模型训练及数据加载流程

[5] - Github 项目 - maskrcnn-benchmark 简单使用及例示

[6] - Github 项目 - mmdetection 目标检测库

[7] - Github 项目 - mmdetection 模型训练

[8] - Github 项目 - mmdetection 数据管道(Data Pipeline)

Last modification:July 3rd, 2020 at 11:05 am