节选 From Keras: Multiple outputs and multiple losses

前面采用 Keras 实现了服装多标签 Multi-label 分类 - 基于 Keras 的服装多标签Multi-label 分类.

这里采用另一种更高级的方法 - 多输出 Multi-output 分类.

Multi-labelMulti-output 分类的对比:

  • [1] Multi-label 分类中,网络结构中只有一组 FC 网络层用于输出分类结果.
  • [2] Multi-output 分类中,网络结构只有有两组输出分支,分别采用 FC 网络层输出分类结果. 每个分支网络都可以进行特定类别任务的分类,使得网络可以学习不想交的标签组合(disjoint label combinations).
  • [3] 结合 Multi-label 分类和 Multi-output 分类,可以使得每个 FC 分支都可以预测多个输出.

应用场景:

采用 Keras 实现 Multi-output 分类,其中,多个网络输出分支,可以学习不相交的标签组合.

Figure 1. Multi-outputs 分类结果.

1. Keras - 多输出和多损失函数

Multiple Outputs 和 Multiple Losses

主要包括:

  • 数据集说明
  • 网络结构 FashionNet
  • FashionNet 训练
  • FashionNet 分类器测试

其中,FashionNet 有两个网络分支:

  • [1] - 一个子网络分支用于服装类型clothing type 的分类,如 shirt, dress, jeans, shoes 等.
  • [2] - 一个子网络分支用于服装颜色clothing color 的分类,如 black, red, blue 等.

1.1 Multi-output 数据集

Figure 2. Multi-ouput 分类数据集. 注:数据集中不包含 red/blue 和 black dress/shirts. 最后的模型仍能够正确进行预测.

数据集包含 2525 张图片,7 中 服装颜色 + 服装类别 组合:

  • Black jeans (344 images)
  • Black shoes (358 images)
  • Blue dress (386 images)
  • Blue jeans (356 images)
  • Blue shirt (369 images)
  • Red dress (380 images)
  • Red shirt (332 images)

数据集创建可参考:How to (quickly) build a deep learning image dataset.

任务目标:类似于 Multi-label 分类,同时预测服装类型和服装颜色.

Figure 3. 数据集中并没有 "black dresses" 训练数据,但采用 Multi-output 分类仍能够正确预测类别标签.

1.2 Multi-output 分类项目组织结构

├── dataset
│   ├── black_jeans [344 entries]
│   ├── black_shoes [358 entries]
│   ├── blue_dress [386 entries]
│   ├── blue_jeans [356 entries]
│   ├── blue_shirt [369 entries]
│   ├── red_dress [380 entries]
│   └── red_shirt [332 entries]
├── examples
│   ├── black_dress.jpg
│   ├── black_jeans.jpg
│   ├── blue_shoes.jpg
│   ├── red_shirt.jpg
│   └── red_shoes.jpg
├── output
│   ├── fashion.model
│   ├── category_lb.pickle
│   ├── color_lb.pickle
│   ├── output_accs.png
│   └── output_losses.png
├── pyimagesearch
│   ├── __init__.py
│   └── fashionnet.py
├── train.py
└── classify.py

主要文件包括:

  • pyimagesearch/fashionnet.py - FashionNet 网络结构,Multi-output 分类输出.
  • train.py - FashionNet 模型的训练,生成 output/ 路径中的文件.
  • classify.py - FashionNet 训练的模型加载和 Multi-output 分类器测试.
  • dataset - 服装数据集.
  • examples - 测试的样例图片
  • output - train.py 模型训练时生成的输出文件.
  • output/fashion.model - 训练得到的模型文件
  • output/category_lb.pickle - 基于 scikit-learn 生成的服装类别的序列化 LabelBinarizer object.
  • output/color_lb.pickle - 服装颜色的序列化 LabelBinarizer object.
  • output/output_acc.png - 训练生成的 accuracy 图像.
  • output/output_losses.png - 训练生成的 losses 图像.
  • pyimagesearch - 包含 FashionNet class 的 Python 模块.

1.3 FashionNet 网络结构说明

正如上面所说,FashionNet 有两个网络分支:

  • [1] - 一个子网络分支用于服装类型clothing type 的分类,如 shirt, dress, jeans, shoes 等.
  • [2] - 一个子网络分支用于服装颜色clothing color 的分类,如 black, red, blue 等.

如图:

Figure 4. Multi-output 分类网络的输出端. (左)服装类别分支, (右)服装颜色分支. 每个网络分支都有 FC 层.

Figure 5. Multi-output 分类网络 FashionNet

每一个网络分支分别进行各自的 Conv 层,Activation 激活层,BatchNormalization 层,Pooling 层,Dropout 和最终的输出层.

网络右边的分支明显比左边的分支网络层更浅(shallower). 因为,预测服装颜色比预测服装类别更容易,故服装颜色分类分支网络层较少.

1.4 FashionNet 网络结构实现

fanshionnet.py:

from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Dropout
from keras.layers.core import Lambda
from keras.layers.core import Dense
from keras.layers import Flatten
from keras.layers import Input
import tensorflow as tf

class FashionNet:
    # 服装类别分类网络分支
    @staticmethod
    def build_category_branch(inputs, numCategories,
                              finalAct="softmax", chanDim=-1):
        # 采用 lambda 层,将 3 channel 的输入转换为灰度表示
        # convert the 3 channel input to a grayscale representation
        x = Lambda(lambda c: tf.image.rgb_to_grayscale(c))(inputs)

        # 网络层 CONV => RELU => POOL
        x = Conv2D(32, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = MaxPooling2D(pool_size=(3, 3))(x)
        x = Dropout(0.25)(x)

        # 网络层 (CONV => RELU) * 2 => POOL
        x = Conv2D(64, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = Conv2D(64, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.25)(x)

        # 网络层 (CONV => RELU) * 2 => POOL
        x = Conv2D(128, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = Conv2D(128, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.25)(x)
        
        # 网络层,定义不同服装类别数的输出层.
        # 如 shirts, jeans, dresses, 等.
        x = Flatten()(x)
        x = Dense(256)(x)
        x = Activation("relu")(x)
        x = BatchNormalization()(x)
        x = Dropout(0.5)(x)
        x = Dense(numCategories)(x)
        x = Activation(finalAct, name="category_output")(x)

        # 返回子网络的类别预测
        return x
    
    # 服装颜色分类网络分支
    @staticmethod
    def build_color_branch(inputs, numColors, finalAct="softmax",
        chanDim=-1):
        # 网络层 CONV => RELU => POOL
        x = Conv2D(16, (3, 3), padding="same")(inputs)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = MaxPooling2D(pool_size=(3, 3))(x)
        x = Dropout(0.25)(x)

        # 网络层 CONV => RELU => POOL
        x = Conv2D(32, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.25)(x)

        # 网络层 CONV => RELU => POOL
        x = Conv2D(32, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.25)(x)
        
        # 网络层,定义不同服装颜色数的输出层.
        # 如 red, black, blue, 等.
        x = Flatten()(x)
        x = Dense(128)(x)
        x = Activation("relu")(x)
        x = BatchNormalization()(x)
        x = Dropout(0.5)(x)
        x = Dense(numColors)(x)
        x = Activation(finalAct, name="color_output")(x)

        # 返回子网络的颜色预测
        return x
    
    # 网络构建
    @staticmethod
    def build(width, height, numCategories, numColors,
              finalAct="softmax"):
        # 初始化 input shape 和 channel dimension.
        inputShape = (height, width, 3)
        chanDim = -1

        # 同时构建 category 和 color 子网络
        inputs = Input(shape=inputShape)
        categoryBranch = FashionNet.build_category_branch(
            inputs, numCategories, finalAct=finalAct, chanDim=chanDim)
        colorBranch = FashionNet.build_color_branch(
            inputs, numColors, finalAct=finalAct, chanDim=chanDim)

        # 采用 input(images batch) 和两个独立 outputs 创建模型.
        model = Model(inputs=inputs,
                      outputs=[categoryBranch, colorBranch],
                      name="fashionnet")

        # 返回构建的网络结构
        return model

关于 x = Lambda(lambda c: tf.image.rgb_to_grayscale(c))(inputs),将图像由 RGB 转换为灰度(grayscale)格式.

这样做的原因分析.

dress 不管是 red, blue, green, black, purple 等哪种颜色,其都是 dress.

故,采取丢弃服装的颜色信息,而集中关注图片的真实结构组成(actual structural components),避免 FashionNet 网络不将特定服装颜色和服装类型联合学习.

FashionNet 网络的两个子分支具有共同的输入 input,但两个不同的输出 - 服装类别和服装颜色.

1.5 Multi-output 和 Multi-loss 训练

train.py:

import matplotlib
matplotlib.use("Agg")

from keras.optimizers import Adam
from keras.preprocessing.image import img_to_array
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from pyimagesearch.fashionnet import FashionNet
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import pickle
import cv2
import os


# 参数配置
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
                help="path to input dataset (i.e., directory of images)")
ap.add_argument("-m", "--model", required=True,
                help="path to output model")
ap.add_argument("-l", "--categorybin", required=True,
                help="path to output category label binarizer")
ap.add_argument("-c", "--colorbin", required=True,
                help="path to output color label binarizer")
ap.add_argument("-p", "--plot", type=str, default="output",
                help="base filename for generated plots")
args = vars(ap.parse_args())


# 参数初始化
EPOCHS = 50
INIT_LR = 1e-3
BS = 32
IMAGE_DIMS = (96, 96, 3)


# 加载数据集路径,并随机打乱
print("[INFO] loading images...")
imagePaths = sorted(list(paths.list_images(args["dataset"])))
random.seed(42)
random.shuffle(imagePaths)

# 初始化 data,categoryLabels 和 colorLabels
data = []
categoryLabels = []
colorLabels = []

# 对 input images 进行图像处理,并提取其 multi-classes-labels
for imagePath in imagePaths:
    # 加载 image,并预处理,保存到 data 列表.
    image = cv2.imread(imagePath)
    image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = img_to_array(image)
    data.append(image)

    # 从图像路径提取 color 和 category 类别标签集,
    # 并更新到对应的 labels 列表.
    (color, cat) = imagePath.split(os.path.sep)[-2].split("_")
    categoryLabels.append(cat)
    colorLabels.append(color)
    
# 像素值转换到 [0, 1] 区间,并转换为 Numpy array.
data = np.array(data, dtype="float") / 255.0
print("[INFO] data matrix: {} images ({:.2f}MB)".format(
    len(imagePaths), data.nbytes / (1024 * 1000.0)))

# 将类别标签列表转换为 Numpy arrays.
categoryLabels = np.array(categoryLabels)
colorLabels = np.array(colorLabels)
    
# 标签的二值化
print("[INFO] binarizing labels...")
categoryLB = LabelBinarizer()
colorLB = LabelBinarizer()
categoryLabels = categoryLB.fit_transform(categoryLabels)
colorLabels = colorLB.fit_transform(colorLabels)

# 数据集划分,train:test=8:2
split = train_test_split(data, categoryLabels, colorLabels,
                         test_size=0.2, random_state=42)
(trainX, testX, trainCategoryY, testCategoryY, 
                             trainColorY, testColorY) = split

# 初始化 FashionNet Multi-output 网络
model = FashionNet.build(96, 96,
                         numCategories=len(categoryLB.classes_),
                         numColors=len(colorLB.classes_),
                         finalAct="softmax")

# 定义两个 dicts:
# 一个指定网络每个输出的 loss;
# 另一个指定每个 loss 的权重weight.
losses = {"category_output": "categorical_crossentropy",
          "color_output": "categorical_crossentropy",
         }
lossWeights = {"category_output": 1.0, "color_output": 1.0}

# 初始化 optimizer 和 compile model.
print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(optimizer=opt, 
              loss=losses, 
              loss_weights=lossWeights,
              metrics=["accuracy"])

# 模型训练,进行 multi-output 分类.
H = model.fit(trainX,
              {"category_output": trainCategoryY, "color_output": trainColorY},
              validation_data=(testX,
                               {"category_output": testCategoryY, 
                                "color_output": testColorY}),
              epochs=EPOCHS,
              verbose=1)

# 保存 Keras 模型
print("[INFO] serializing network...")
model.save(args["model"])

# 保存 category binarizer
print("[INFO] serializing category label binarizer...")
f = open(args["categorybin"], "wb")
f.write(pickle.dumps(categoryLB))
f.close()

# 保存 color binarizer
print("[INFO] serializing color label binarizer...")
f = open(args["colorbin"], "wb")
f.write(pickle.dumps(colorLB))
f.close()

# plot the total loss, category loss, and color loss
lossNames = ["loss", "category_output_loss", "color_output_loss"]
plt.style.use("ggplot")
(fig, ax) = plt.subplots(3, 1, figsize=(13, 13))

# loop over the loss names
for (i, l) in enumerate(lossNames):
    # plot the loss for both the training and validation data
    title = "Loss for {}".format(l) if l != "loss" else "Total loss"
    ax[i].set_title(title)
    ax[i].set_xlabel("Epoch #")
    ax[i].set_ylabel("Loss")
    ax[i].plot(np.arange(0, EPOCHS), H.history[l], label=l)
    ax[i].plot(np.arange(0, EPOCHS), H.history["val_" + l],
        label="val_" + l)
    ax[i].legend()

# 保存 losses figure.
plt.tight_layout()
plt.savefig("{}_losses.png".format(args["plot"]))
plt.close()

# 创建 accuracies figure
accuracyNames = ["category_output_acc", "color_output_acc"]
plt.style.use("ggplot")
(fig, ax) = plt.subplots(2, 1, figsize=(8, 8))

# loop over the accuracy names
for (i, l) in enumerate(accuracyNames):
    # plot the loss for both the training and validation data
    ax[i].set_title("Accuracy for {}".format(l))
    ax[i].set_xlabel("Epoch #")
    ax[i].set_ylabel("Accuracy")
    ax[i].plot(np.arange(0, EPOCHS), H.history[l], label=l)
    ax[i].plot(np.arange(0, EPOCHS), H.history["val_" + l],
        label="val_" + l)
    ax[i].legend()

# 保存 accuracies figure
plt.tight_layout()
plt.savefig("{}_accs.png".format(args["plot"]))
plt.close()

模型训练:

python3 train.py --dataset dataset \
                --model output/fashion.model \
                --categorybin output/category_lb.pickle \
                --colorbin output/color_lb.pickle

训练 50 个 epochs 的结果为:

loss: 0.0344 - 
category_output_loss: 0.0117 - 
color_output_loss: 0.0227 - 
category_output_acc: 0.9965 - 
color_output_acc: 0.9931 - 

val_loss: 0.2268 - 
val_category_output_loss: 0.1094 - 
val_color_output_loss: 0.1174 - 
val_category_output_acc: 0.9782 - 
val_color_output_acc: 0.9723

训练 Losses 可视化:

训练 Accuracy 可视化:

采用数据增强对于进一步提高 accuracy 有帮助.

1.6 Multi-output 分类测试

classify.py

from keras.preprocessing.image import img_to_array
from keras.models import load_model
import tensorflow as tf
import numpy as np
import argparse
import imutils
import pickle
import cv2

# 参数设置
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
                help="path to trained model model")
ap.add_argument("-l", "--categorybin", required=True,
                help="path to output category label binarizer")
ap.add_argument("-c", "--colorbin", required=True,
                help="path to output color label binarizer")
ap.add_argument("-i", "--image", required=True,
                help="path to input image")
args = vars(ap.parse_args())

# 读取图片
image = cv2.imread(args["image"])
output = imutils.resize(image, width=400)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# 图片预处理, 用于分类
image = cv2.resize(image, (96, 96))
image = image.astype("float") / 255.0
image = img_to_array(image)
image = np.expand_dims(image, axis=0)

# 加载训练的 CNNs 模型
# 并加载 category 和 color label binarizers
print("[INFO] loading network...")
model = load_model(args["model"], custom_objects={"tf": tf})
categoryLB = pickle.loads(open(args["categorybin"], "rb").read())
colorLB = pickle.loads(open(args["colorbin"], "rb").read())

# 对输入图片进行 Multi-output 分类
print("[INFO] classifying image...")
(categoryProba, colorProba) = model.predict(image)

# 根据最大概率,同时找出 category 和 color 输出的索引,
# 然后确定对应的类别标签.
categoryIdx = categoryProba[0].argmax()
colorIdx = colorProba[0].argmax()
categoryLabel = categoryLB.classes_[categoryIdx]
colorLabel = colorLB.classes_[colorIdx]

# 在图片上画出 category label 和 color label
categoryText = "category: {} ({:.2f}%)".format(
    categoryLabel, categoryProba[0][categoryIdx] * 100)
colorText = "color: {} ({:.2f}%)".format(
    colorLabel, colorProba[0][colorIdx] * 100)
cv2.putText(output, categoryText, (10, 25), 
            cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
cv2.putText(output, colorText, (10, 55), 
            cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)

# 打印预测结果
print("[INFO] {}".format(categoryText))
print("[INFO] {}".format(colorText))

# 显示输出图片
cv2.imshow("Output", output)
cv2.waitKey(0)

图片测试:

python3 classify.py --model output/fashion.model \
                    --categorybin output/category_lb.pickle \
                    --colorbin output/color_lb.pickle \
                    --image examples/black_dress.jpg

如,结果如下:

category: shirt (98.82%)
color: black (97.99%)

2. 总结

至此,对于 Multi-output 和 Multi-losses 的深入方向可有:

  • 自定义目标检测器的训练
  • Multi-GPU 训练
  • 情绪和面部表情识别
  • 难以一次性加载进内存的大规模数据集的训练
  • ......

FashionNet 网络结构用于 服装多标签的自动标注.

其解决了在 基于 Keras 的服装多标签Multi-label 分类 中面临的问题:

其中,SmallerVGGNet 在六种类别的数据集上训练,如 black jeans, blue dresses, blue jeans, blue shirts, red dresses, red shirts 等. 但,不能对于数据集未包含的数据组合,如 "black dress",进行正确的分类.

而 FashionNet 通过设计 Multi-output 分类网络,分别对服装类别和服装颜色进行识别,其可以对训练数据集中未包含的数据进行分类,如 "black dress".

注:网络训练时,需要提供待识别的每一个类别的训练样本.

you should always try to provide example training data for each class you want to recognize — deep neural networks, while powerful, are not “magic”!

Last modification:November 3rd, 2018 at 04:05 pm