单标签图片分类是指, 每张图片只能有一个标签, 比如是狗, 或猫.
一般来说的图片分类就是单标签的. 这也是 CNN 成功最初很好解决的问题.

<h2>1. 训练数据制作</h2>

主要是生成 train.txt、 val.txt 和 test.txt 文件,使其内容格式为:

image_name1 label1
image_name2 label2
image_name3 label3
......

如:

img1.jpg 0
img2.jpg 0
img3.jpg 1
img4.jpg 1
img5.jpg 2
img6.jpg 2
......

<h2>2. 生成数据集的 lmdb</h2>

主要是利用 caffe 里自带的 create_imagenet.sh 和 make_imagenet_mean.sh 脚本,主要修改的地方有:

  • create_imagenet.sh
#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e

EXAMPLE=/path/to/save/lmdb/  # 生成lmdb的保存路径
DATA=/path/to/tranorval/txt/   #  train.txt, val.txt 所在路径
TOOLS=/path/to/caffe/build/tools  #  caffe 编译后的工具

TRAIN_DATA_ROOT=/path/to/images/train/  # train 图像数据集的根目录
VAL_DATA_ROOT=/path/to/images/val/  # val 图像数据集的根目录

# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=false  ## 修改为 true,将图片尺寸统一为 256x256
if  *RESIZE; then    
    RESIZE_HEIGHT=256
    RESIZE_WIDTH=256
else
    RESIZE_HEIGHT=0
    RESIZE_WIDTH=0
fi

if [ ! -d "*TRAIN_DATA_ROOT" ]; then    
  echo "Error: TRAIN_DATA_ROOT is not a path to a directory: *TRAIN_DATA_ROOT"
  echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet training data is stored."
  exit 1
fi

if [ ! -d "*VAL_DATA_ROOT" ]; then
  echo "Error: VAL_DATA_ROOT is not a path to a directory: *VAL_DATA_ROOT"
  echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet validation data is stored."
  exit 1
fi

echo "Creating train lmdb..."

GLOG_logtostderr=1  *TOOLS/convert_imageset \
    --resize_height=*RESIZE_HEIGHT \
    --resize_width=*RESIZE_WIDTH \
    --shuffle \
    *TRAIN_DATA_ROOT \
    *DATA/train.txt \
    *EXAMPLE/train_lmdb  # 生成 train 的 lmdb, 可以修改名字

echo "Creating val lmdb..."

GLOG_logtostderr=1  *TOOLS/convert_imageset \
    --resize_height=*RESIZE_HEIGHT \
    --resize_width=*RESIZE_WIDTH \
    --shuffle \
    *VAL_DATA_ROOT \
    *DATA/val.txt \
    *EXAMPLE/val_lmdb  # 生成 val 的 lmdb, 可以修改名字

echo "Done."
  • make_imagenet_mean.sh
#!/usr/bin/env sh
# Compute the mean image from the imagenet training lmdb
# N.B. this is available in data/ilsvrc12

EXAMPLE=/path/to/save/lmdb/  # 生成lmdb的保存路径
DATA=/path/to/tranorval/txt/   #  train.txt, val.txt 所在路径
TOOLS=/path/to/caffe/build/tools  #  caffe 编译后的工具

TOOLS/compute_image_mean EXAMPLE/train_lmdb \
  *DATA/images_mean.binaryproto  # 生成的均值文件,名字可修改

echo "Done."

注: * 替换为 $ 符号.

分别运行 create_imagenet.sh 和 make_imagenet_mean.sh, 即可生成train训练数据集和val验证数据集的 lmdb 文件,及对应的均值文件.

将 images_mean.binaryproto 均值文件转换为 image_mean.npy:

import caffe

caffeBlob = caffe.proto.caffe_pb2.BlobProto()
data = open('imeags_mean.binaryproto', 'rb' ).read()
caffeBlob.ParseFromString(data)
array = np.array(caffe.io.blobproto_to_array(caffeBlob))
mean_file = 'imeages_mean.npy'
np.save(mean_file, array[0])

<h2>3. 网络修改与模型训练</h2>

对alexnt、caffenet、googlenet等网络数据层和输出层进行修改,即可.

<h3>3.1 修改网络数据层</h3>

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 224
    mean_file: "/path/to/images_mean.binaryproto" #  均值文件路径
    #mean_value: 104 
    #mean_value: 117
    #mean_value: 123
  }
  data_param {
    source: "/path/to/train_lmdb"  # 训练集的lmdb
    batch_size: 32 
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 224
    mean_file: "/path/to/images_mean.binaryproto" #  均值文件路径
    #mean_value: 104
    #mean_value: 117
    #mean_value: 123
  }
  data_param {
    source: "/path/to/val_lmdb"  # 验证集的lmdb
    batch_size: 50  # 一般情况下,batch_size * (solver中)test_iter 约等于验证集大小
    backend: LMDB
  }
}

<h3>3.2 修改网络输出层</h3>

一般情况下,网络只有一个输出,只需要一处即可以,比如 alexnet、caffenet,但 googlenet 有三处需要修改——两个辅助loss层. 这里以 alexnet 为例.

  • train_val.prototxt 修改
layer {
  name: "fc8"   # 网络输出数据层名称, 需修改
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"  # 网络输出数据层名称, 需修改
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000  # 网络输出数,即类别数据, 需修改
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8"  # 与网络输出数据层名称一致
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"  # 与网络输出数据层名称一致
  bottom: "label"
  top: "loss"
}
  • deploy.prototxt 修改
layer {
  name: "fc8"  # 与train_val.prototxt中网络输出数据层名称一致
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"  # 与train_val.prototxt中网络输出数据层名称一致
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000  # 与train_val.prototxt中网络输出类别数一致
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8"
  top: "prob"
}

<h3>3.3 模型训练</h3>

  • 修改 solver.prototxt
net: "/path/to/alexnet/train_val.prototxt"  
test_iter: 1000  # batch_size * test_iter 约等于验证集大小
test_interval: 4000  # 迭代多少次测试一次
test_initialization: false  
display: 40
average_loss: 40
base_lr: 0.01
lr_policy: "step"
stepsize: 320000 
gamma: 0.96
max_iter: 10000000 
momentum: 0.9
weight_decay: 0.0002
snapshot: 40000
snapshot_prefix: "/path/to/alexnet"  # 训练的 caffemodel 保存位置
solver_mode: GPU
  • 修改并运行 trainnet.sh
#!/usr/bin/env sh

./build/tools/caffe train \
    -solver /path/to/alexnet/solver.prototxt \
    -gpu 0

运行脚本:

./trainnet.sh

<h2>4. 图片测试</h2>

利用 deploy.prototxt 和训练得到的 caffemodel 进行图片分类测试,其 python 脚本如下:

# coding=utf-8
import numpy as np
import caffe

img = '/path/to/test.jpg'
classes = '/class/names/define'  # 分类的类别名

model_def  = '/path/to/deploy.prototxt' 
weight_def = '/path/to/alexnet_iter_xxx.caffemodel'  # 训练得到的 caffemodel
net = caffe.Net(model_def, weight_def, caffe.TEST)  # 网络和模型加载,TEST模式

# 图片预处理设置
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})  
transformer.set_transpose('data', (2,0,1))  # 改变维度的顺序
transformer.set_mean('data', np.load(mean_file).mean(1).mean(1))  # 减均值
transformer.set_raw_scale('data', 255)   # 缩放到[0,255]
transformer.set_channel_swap('data', (2,1,0))   #将图片由RGB变为BGR

im = caffe.io.load_image(img)  # 测试图片加载
net.blobs['data'].data[...] = transformer.preprocess('data',im)  # 测试图片预处理,并载入到blob

out = net.forward()
prob = net.blobs['Softmax1'].data[0].flatten() # 网络输出层(Softmax)值,表示属于某个类别的概率值
idx = prob.argsort()[-1]  # 排序概率,取最大值的索引 
print 'the class is:', classes[idx]   # 将最大值索引转换成对应的类别名称
Last modification:October 9th, 2018 at 09:31 am