CaffeLoss - LossLayers 简述

Author： AIHGF
发布时间：May 19, 2018
5830views
No comments
10697 words
Categories：深度平台

loss 是网络输出的 target 值与真实label之间的误差值
forward-pass 计算得到 loss 值, 然后 backward-pass 计算loss 梯度, 最小化loss 以优化网络参数

Caffe 提供的 loss 层

1. SoftmaxWithLoss

用于一对多(one-of-many) 的分类任务，计算多项 logistic 损失值. 通过 softmax 来传递实值预测值，以得到关于各类的概率分布.

该网络层可以分解为 SoftmaxLayer + MultinomialLogisticLoss 层的组合，不过其梯度计算更加数值稳定.

测试时，该网络层可以由 SoftmaxLayer 层代替.

1.1 Forward 参数

输入 Input 1 - 预测值 ${x}$，N×C×H×W，其值区间为 [-inf, inf] ，表示对于 K=CHW 类的每一类的预测分数值.
通过 SoftmaxLayer ${ \hat{p}_{nk} = \frac{exp(x_{nk})}{[\sum_{i}exp(x_{ni})]} }$ 来将预测值(scores) ${x}$ 映射得到关于各类别的概率分布.
[注：SoftmaxLayer 只是输出每一类的概率值，并不与 label 作比较.]

输入 Input2 - 真实值 label ${l}$，N×1 × 1 × 1，实值，区间为 $ {l_n \in [0, 1, 2, ..., K-1] }$，分别表示 K 类中的真实类别标签 label.

输出参数 Output 1 - 计算的 cross-entropy 分类 loss 值，(1 × 1 × 1 × 1).
${ E = \frac{-1}{N} \sum_{n=1}^{N} log(\hat{p}_n, l_n) }$，${\hat{p} }$ 是 Softmax 输出的类别概率.

1.2 Backward 参数

计算关于预测值的 softmax loss 误差值梯度.

不计算关于 label 输入[bottom[1]]的梯度.

    template<typename Dtype>
    void caffe::SoftmaxWithLossLayer< Dtype >::Backward_cpu(
        const vector< Blob< Dtype > *> &  top,
        const vector< bool > & propagate_down,
        const vector< Blob< Dtype > *> &  bottom 
    )

参数：

top - (1×1 × 1 × 1)，其 diff 为 loss_weight ${\lambda}$，也就是该层输出 ${l_i}$ 的系数，整体网络 Loss ${E = \lambda_i l_i + other \ loss \ terms }$，有 ${\frac{\partial E}{\partial l_i} = \lambda _i }$.
propagate_down[1] - 必须是 false，因为不对 label 作梯度计算.
bottom - [0] (N × C × H × W)，预测值 ${x}$；backward 计算 diff ${\frac{\partial E}{ \partial x} }$.
bottom - [1] (N × 1 × 1 × 1)，labels，忽略，不计算.

1.3 prototxt 定义

    layer {
      name: "loss"
      type: "SoftmaxWithLoss"
      bottom: "fc8"
      bottom: "label"
      top: "loss"
      loss_param{
        ignore_label：0  # 指定 label 值，在计算 loss 时忽略该值.
        normalize: true # 如果为 true，则基于当前 labels 数量(不包含忽略的 label) 进行归一化; 否则，只是加和.
      }
    }

2. EuclideanLoss

计算两个输入的平方和. 用于实值回归任务.

计算公式：
${ E = \frac{1}{2N} \sum_{n=1}^{N} || \hat{y}_n - y_n ||_2^2 }$.

可以用于最小二乘(least-squares) 回归任务. 将 InnerProductLayer 的输出值作为 EuclideanLossLayer 的输入，即是线性最小二乘回归问题.

2.1 Forward 参数

输入 Input 1 - (N × C × H ×W)，预测值 ${ \hat{y} \in [-inf, inf] }$
输入 Input 2 - (N × C × H ×W)，目标值 ${y \in [-inf, inf] }$

输出 Output 1 - (1 × 1 × 1 × 1)，计算的 Euclidean Loss 值.

2.2 Backward 参数

计算关于输入的 Euclidean 误差梯度.

    template<typename Dtype>
    void caffe::EuclideanLossLayer< Dtype >::Backward_cpu   (   
        const vector< Blob< Dtype > *> &    top,
        const vector< bool > &  propagate_down,
        const vector< Blob< Dtype > *> &    bottom 
    )

参数：

top - 如上.
propagate_down - EuclideanLossLayer 可以计算关于 label (bottom[1]) 的梯度.
bottom - [0] (N × C × H × W)，预测值 ${\hat{y}}$；backward 计算梯度 diff ${ \frac{\partial E}{ \partial {\hat{y}}} = \frac{1}{n}\sum_{n=1}^{N}(\hat{y}_n - y_n) }$.
bottom - [1] (N × C × H × W)，真实值 $y$；backward 计算梯度 diff ${ \frac{\partial E}{ \partial y} = \frac{1}{n}\sum_{n=1}^{N}(y_n - \hat{y}_n)}$.

2.3 prototxt 定义

    layer {
      name: "loss"
      type: "EuclideanLoss"
      bottom: "pred"
      bottom: "label"
      top: "loss"
      loss_weight: 1
    }

3. MultinomialLogisticLoss

多项 logistic 损失函数层，用于一对多的分类任务，其直接采用预测的概率分布作为网络层输入.

当预测值不是概率分布时，应该采用 SoftmaxWithLossLayer，其在计算多项 logistic loss 前，采用 SoftmaxLayer 将预测值映射到概率分布.

3.1 Forward 参数

输入 Input 1 - 预测值 ${\hat{p}}$，(N × C ×H× W)，其取值区间 [0, 1]，表示对于 K=CHW 类的预测概率.
每个预测向量 ${\hat{p}_n}$ 的和应该为 1，${\forall n \sum_{k=1}^{K} \hat{p}_{nk} = 1}$.

输入 Input2 - 真实值 label ${l}$，(N × 1 × 1 × 1)，实值 ${l_n \in [0, 1, 2,...,K-1]}$，为 K 类 classes 中的真实类别标签.

输出 Output 1 - (1 × 1 × 1 × 1)，计算的多项 logistic loss 值：${E = \frac{-1}{N} \sum _{n=1}^{N} log(\hat{p}_n, l_n) }$.

3.2 prototxt 定义

    layer {
      name: "loss"
      type: "MultinomialLogisticLoss"
      bottom: "fc8"
      bottom: "label"
      top: "loss"
      loss_param{
        ignore_label：0
        normalize: true
        FULL = 0
      }
    }

    message LossParameter {
      optional int32 ignore_label = 1;
      enum NormalizationMode {
        FULL = 0;
        VALID = 1;
        BATCH_SIZE = 2;
        NONE = 3;
      }
      optional NormalizationMode normalization = 3 [default = VALID];
      optional bool normalize = 2;
    }

4. InfogainLoss

信息增益损失函数

InfogainLossLayer 是 MultinomalLogisticLossLayer 的一种泛化形式.

其采用“信息增益”(information gain， infogain) 矩阵来指定所有的 label pairs 的“值”(value).

不仅仅接受预测的每个样本在每类上的概率信息，还接受信息增益矩阵信息.

当 infogain 矩阵是单位矩阵时，则与 MultinomalLogisticLossLayer 等价.

    message InfogainLossParameter {
      // Specify the infogain matrix source.
      optional string source = 1;
      optional int32 axis = 2 [default = 1]; // axis of prob
    }

4.1 Forward 参数

输入 Input2 - 真实值 label ${l}$，(N × 1 × 1 × 1)，实值 ${l_n \in [0, 1, 2,...,K-1]}$，为 K 类 classes 中的真实类别标签.

输入 Input3 - (optional)， 1 × 1 × K × K，infogain 矩阵 H.

输出 Output 1 - (1 × 1 × 1 × 1)，计算的 infogain 多项 logistic loss 值：${E = \frac{-1}{N} \sum _{n=1}^{N} H_{l_n}log(\hat{p}_n, l_n) = \frac{-1}{N} \sum_{n=1}^{N} \sum_{k=1}^{K} H_{l_n} log(\hat{p}_n, k) }$.

其中 ${H_{l_n}}$ 表示 infogain 矩阵 H 的第 ${l_n}$ 行.

4.2 prototxt 定义

    layer {
        bottom: "score"
        bottom: "label"
        top: "infoGainLoss"
        name: "infoGainLoss"
        type: "InfogainLoss"
        infogain_loss_param {
            source: "/.../infogainH.binaryproto"
            axis: 1  # compute loss and probability along axis
        }
    }

5. HingeLoss

用于一对多 的分类任务.
其有时也被叫做 Max-Margin Loss. SVM 的目标函数也层用过.

比如，二分类情况时，

${l(y) = max(0, 1-t \cdot y)}$

y 为[-1, 1]区间的预测值，t=[+1, -1] 为目标值.

也就是 ${|y| \leq 1}$，也就是对某个正确分类的样本距离分割线的距离大于1时，不给予任何奖赏，避免分类过度注重某些类，更关注与整体的分类 Loss.

    message HingeLossParameter {
      enum Norm {
        L1 = 1;
        L2 = 2;
      }
      // Specify the Norm to use L1 or L2
      optional Norm norm = 1 [default = L1];
    }

5.1 Forward 参数

输入 Input 1 - 预测值 t，(N × C ×H× W)，其取值区间 [-inf，inf]，表示对于 K=CHW 类的预测概率.
在 SVM 中，假设 D-dim 特征 ${X \in \mathcal{R}^{D × N}}$ 和学习的超参数 ${W \in \mathcal{R}^{D × K} }$，t 是內积 ${X^{T}W}$ 的结果.

因此，如果网络只有一个 InnerProductLayer，其num_output=D，将其输出的预测值输入到 HingeLossLayer，且没有其它待学习参数或 losses，则等价于 SVM.

输入 Input2 - 真实值 label ${l}$，(N × 1 × 1 × 1)，实值 ${l_n \in [0, 1, 2,...,K-1]}$，为 K 类 classes 中的真实类别标签.

输出 Output 1 - (1 × 1 × 1 × 1)，计算的 hinge loss 值：${ E = \frac{1}{N} \sum_{n=1}^{N} \sum_{k=1}^{K} [max(0, 1 - \delta \{l_n = k \} t_{nk})]^p }$，

${L^p}$ 范数，默认 p=1 - L1 范数；p=2 - L2 范数，如 L2-SVM.
如果 condition=True，即条件成立，则 ${\delta condition = 1}$；否则，${ \delta condition= -1}$.

5.2 prototxt 定义

    # L1 Norm
    layer {
      name: "loss"
      type: "HingeLoss"
      bottom: "pred"
      bottom: "label"
    }

    # L2 Norm
    layer {
      name: "loss"
      type: "HingeLoss"
      bottom: "pred"
      bottom: "label"
      top: "loss"
      hinge_loss_param {
        norm: L2
      }
    }

6. ContrastiveLoss

Caffe Siamese Network 采用了 ContrastiveLoss 函数，能够有效的处理 paired data.

如 Caffe - mnist_siamese.ipynb.

ContrastiveLoss 计算公式：

${E = \frac{1}{2N} \sum_{n=1}^{N} (y) d^2 + (1-y) max(margin - d, 0)^2}$

其中，${d = ||a_n - b_n||_2}$.

    message ContrastiveLossParameter {
      // margin for dissimilar pair
      optional float margin = 1 [default = 1.0];
      // The first implementation of this cost did not exactly match the cost of
      // Hadsell et al 2006 -- using (margin - d^2) instead of (margin - d)^2.
      // legacy_version = false (the default) uses (margin - d)^2 as proposed in the
      // Hadsell paper. New models should probably use this version.
      // legacy_version = true uses (margin - d^2). This is kept to support /
      // reproduce existing models and results
      optional bool legacy_version = 2 [default = false];
    }

6.1 Forward 参数

输入 Input 1 - (N × C × 1 × 1)，特征 ${a \in [-inf, inf]}$

输入 Input2 - (N × C × 1 × 1)，特征 ${b \in [-inf, inf]}$

输入 Input3 - N × 1 × 1 × 1，二值相似度 ${s \in [0, 1]}$

输出 Output 1 - (1 × 1 × 1 × 1)，计算的 contrastive loss 值 E，用于训练 siamese 网络.

6.2 prototxt 定义

    layer {
      name: "loss"
      type: "ContrastiveLoss"
      bottom: "feat"
      bottom: "feat_p"
      bottom: "sim"
      top: "loss"
      contrastive_loss_param {
        margin: 1
      }
    }

From mnist_siamese_train_test.prototxt

7. Accuracy

计算 一对多 分类任务的分类精度.

没有 backward 计算.

    message AccuracyParameter {
      // top_k 精度
      optional uint32 top_k = 1 [default = 1];

      // The "label" axis of the prediction blob, whose argmax corresponds to the
      // predicted label -- may be negative to index from the end (e.g., -1 for the
      // last axis).  For example, if axis == 1 and the predictions are
      // (N x C x H x W), the label blob is expected to contain N*H*W ground truth
      // labels with integer values in {0, 1, ..., C-1}.
      optional int32 axis = 2 [default = 1];

      // 精度计算，忽略 ignore_label 
      optional int32 ignore_label = 3;
    }

7.1 参数

AccuracyLayer 提供了 AccuracyParameter accuracy_param 参数选项：

op_k - 可选，默认为 1. 选取最大的 k 个预测值为正确预测. 如，k=5 表示，如果 groundtruth label 在 top 5 的预测 labels 内，则认为是预测正确.

Reference

[1] - 交叉熵代价函数(损失函数)及其求导推导

[2] - caffe层解读系列——hinge_loss

[3] - 损失函数改进方法总览

[4] - 视觉分类任务中处理不平衡问题的loss比较

[5] - caffe Namespace Reference

[6] - 机器学习中的损失函数（着重比较：hinge loss vs softmax loss）

Last modification：October 10th, 2018 at 03:50 pm

CaffeLoss - LossLayers 简述

AIHGF • 2018 年 05 月 19 日

loss 是网络输出的 target 值与真实label之间的误差值
forward-pass 计算得到 loss 值, 然后 backward-pass 计算loss 梯度, 最小化loss 以优化网络参数

Caffe 提供的 loss 层

1. SoftmaxWithLoss

用于一对多(one-of-many) 的分类任务，计算多项 logistic 损失值. 通过 softmax 来传递实值预测值，以得到关于各类的概率分布.

该网络层可以分解为 SoftmaxLayer + MultinomialLogisticLoss 层的组合，不过其梯度计算更加数值稳定.

测试时，该网络层可以由 SoftmaxLayer 层代替.

1.1 Forward 参数

输入 Input2 - 真实值 label ${l}$，N×1 × 1 × 1，实值，区间为 $ {l_n \in [0, 1, 2, ..., K-1] }$，分别表示 K 类中的真实类别标签 label.

输出参数 Output 1 - 计算的 cross-entropy 分类 loss 值，(1 × 1 × 1 × 1).
${ E = \frac{-1}{N} \sum_{n=1}^{N} log(\hat{p}_n, l_n) }$，${\hat{p} }$ 是 Softmax 输出的类别概率.

1.2 Backward 参数

计算关于预测值的 softmax loss 误差值梯度.

不计算关于 label 输入[bottom[1]]的梯度.

    template<typename Dtype>
    void caffe::SoftmaxWithLossLayer< Dtype >::Backward_cpu(
        const vector< Blob< Dtype > *> &  top,
        const vector< bool > & propagate_down,
        const vector< Blob< Dtype > *> &  bottom 
    )

参数：

top - (1×1 × 1 × 1)，其 diff 为 loss_weight ${\lambda}$，也就是该层输出 ${l_i}$ 的系数，整体网络 Loss ${E = \lambda_i l_i + other \ loss \ terms }$，有 ${\frac{\partial E}{\partial l_i} = \lambda _i }$.
propagate_down[1] - 必须是 false，因为不对 label 作梯度计算.
bottom - [0] (N × C × H × W)，预测值 ${x}$；backward 计算 diff ${\frac{\partial E}{ \partial x} }$.
bottom - [1] (N × 1 × 1 × 1)，labels，忽略，不计算.

1.3 prototxt 定义

    layer {
      name: "loss"
      type: "SoftmaxWithLoss"
      bottom: "fc8"
      bottom: "label"
      top: "loss"
      loss_param{
        ignore_label：0  # 指定 label 值，在计算 loss 时忽略该值.
        normalize: true # 如果为 true，则基于当前 labels 数量(不包含忽略的 label) 进行归一化; 否则，只是加和.
      }
    }

2. EuclideanLoss

计算两个输入的平方和. 用于实值回归任务.

计算公式：
${ E = \frac{1}{2N} \sum_{n=1}^{N} || \hat{y}_n - y_n ||_2^2 }$.

可以用于最小二乘(least-squares) 回归任务. 将 InnerProductLayer 的输出值作为 EuclideanLossLayer 的输入，即是线性最小二乘回归问题.

2.1 Forward 参数

输入 Input 1 - (N × C × H ×W)，预测值 ${ \hat{y} \in [-inf, inf] }$
输入 Input 2 - (N × C × H ×W)，目标值 ${y \in [-inf, inf] }$

输出 Output 1 - (1 × 1 × 1 × 1)，计算的 Euclidean Loss 值.

2.2 Backward 参数

计算关于输入的 Euclidean 误差梯度.

    template<typename Dtype>
    void caffe::EuclideanLossLayer< Dtype >::Backward_cpu   (   
        const vector< Blob< Dtype > *> &    top,
        const vector< bool > &  propagate_down,
        const vector< Blob< Dtype > *> &    bottom 
    )

参数：

top - 如上.
propagate_down - EuclideanLossLayer 可以计算关于 label (bottom[1]) 的梯度.
bottom - [0] (N × C × H × W)，预测值 ${\hat{y}}$；backward 计算梯度 diff ${ \frac{\partial E}{ \partial {\hat{y}}} = \frac{1}{n}\sum_{n=1}^{N}(\hat{y}_n - y_n) }$.
bottom - [1] (N × C × H × W)，真实值 $y$；backward 计算梯度 diff ${ \frac{\partial E}{ \partial y} = \frac{1}{n}\sum_{n=1}^{N}(y_n - \hat{y}_n)}$.

2.3 prototxt 定义

    layer {
      name: "loss"
      type: "EuclideanLoss"
      bottom: "pred"
      bottom: "label"
      top: "loss"
      loss_weight: 1
    }

3. MultinomialLogisticLoss

多项 logistic 损失函数层，用于一对多的分类任务，其直接采用预测的概率分布作为网络层输入.

当预测值不是概率分布时，应该采用 SoftmaxWithLossLayer，其在计算多项 logistic loss 前，采用 SoftmaxLayer 将预测值映射到概率分布.

3.1 Forward 参数

输入 Input2 - 真实值 label ${l}$，(N × 1 × 1 × 1)，实值 ${l_n \in [0, 1, 2,...,K-1]}$，为 K 类 classes 中的真实类别标签.

输出 Output 1 - (1 × 1 × 1 × 1)，计算的多项 logistic loss 值：${E = \frac{-1}{N} \sum _{n=1}^{N} log(\hat{p}_n, l_n) }$.

3.2 prototxt 定义

    layer {
      name: "loss"
      type: "MultinomialLogisticLoss"
      bottom: "fc8"
      bottom: "label"
      top: "loss"
      loss_param{
        ignore_label：0
        normalize: true
        FULL = 0
      }
    }

    message LossParameter {
      optional int32 ignore_label = 1;
      enum NormalizationMode {
        FULL = 0;
        VALID = 1;
        BATCH_SIZE = 2;
        NONE = 3;
      }
      optional NormalizationMode normalization = 3 [default = VALID];
      optional bool normalize = 2;
    }

4. InfogainLoss

信息增益损失函数

InfogainLossLayer 是 MultinomalLogisticLossLayer 的一种泛化形式.

其采用“信息增益”(information gain， infogain) 矩阵来指定所有的 label pairs 的“值”(value).

不仅仅接受预测的每个样本在每类上的概率信息，还接受信息增益矩阵信息.

当 infogain 矩阵是单位矩阵时，则与 MultinomalLogisticLossLayer 等价.

    message InfogainLossParameter {
      // Specify the infogain matrix source.
      optional string source = 1;
      optional int32 axis = 2 [default = 1]; // axis of prob
    }

4.1 Forward 参数

输入 Input2 - 真实值 label ${l}$，(N × 1 × 1 × 1)，实值 ${l_n \in [0, 1, 2,...,K-1]}$，为 K 类 classes 中的真实类别标签.

输入 Input3 - (optional)， 1 × 1 × K × K，infogain 矩阵 H.

其中 ${H_{l_n}}$ 表示 infogain 矩阵 H 的第 ${l_n}$ 行.

4.2 prototxt 定义

    layer {
        bottom: "score"
        bottom: "label"
        top: "infoGainLoss"
        name: "infoGainLoss"
        type: "InfogainLoss"
        infogain_loss_param {
            source: "/.../infogainH.binaryproto"
            axis: 1  # compute loss and probability along axis
        }
    }

5. HingeLoss

用于一对多 的分类任务.
其有时也被叫做 Max-Margin Loss. SVM 的目标函数也层用过.

比如，二分类情况时，

${l(y) = max(0, 1-t \cdot y)}$

y 为[-1, 1]区间的预测值，t=[+1, -1] 为目标值.

也就是 ${|y| \leq 1}$，也就是对某个正确分类的样本距离分割线的距离大于1时，不给予任何奖赏，避免分类过度注重某些类，更关注与整体的分类 Loss.

    message HingeLossParameter {
      enum Norm {
        L1 = 1;
        L2 = 2;
      }
      // Specify the Norm to use L1 or L2
      optional Norm norm = 1 [default = L1];
    }

5.1 Forward 参数

因此，如果网络只有一个 InnerProductLayer，其num_output=D，将其输出的预测值输入到 HingeLossLayer，且没有其它待学习参数或 losses，则等价于 SVM.

输入 Input2 - 真实值 label ${l}$，(N × 1 × 1 × 1)，实值 ${l_n \in [0, 1, 2,...,K-1]}$，为 K 类 classes 中的真实类别标签.

输出 Output 1 - (1 × 1 × 1 × 1)，计算的 hinge loss 值：${ E = \frac{1}{N} \sum_{n=1}^{N} \sum_{k=1}^{K} [max(0, 1 - \delta \{l_n = k \} t_{nk})]^p }$，

${L^p}$ 范数，默认 p=1 - L1 范数；p=2 - L2 范数，如 L2-SVM.
如果 condition=True，即条件成立，则 ${\delta condition = 1}$；否则，${ \delta condition= -1}$.

5.2 prototxt 定义

    # L1 Norm
    layer {
      name: "loss"
      type: "HingeLoss"
      bottom: "pred"
      bottom: "label"
    }

    # L2 Norm
    layer {
      name: "loss"
      type: "HingeLoss"
      bottom: "pred"
      bottom: "label"
      top: "loss"
      hinge_loss_param {
        norm: L2
      }
    }

6. ContrastiveLoss

Caffe Siamese Network 采用了 ContrastiveLoss 函数，能够有效的处理 paired data.

如 Caffe - mnist_siamese.ipynb.

ContrastiveLoss 计算公式：

${E = \frac{1}{2N} \sum_{n=1}^{N} (y) d^2 + (1-y) max(margin - d, 0)^2}$

其中，${d = ||a_n - b_n||_2}$.

    message ContrastiveLossParameter {
      // margin for dissimilar pair
      optional float margin = 1 [default = 1.0];
      // The first implementation of this cost did not exactly match the cost of
      // Hadsell et al 2006 -- using (margin - d^2) instead of (margin - d)^2.
      // legacy_version = false (the default) uses (margin - d)^2 as proposed in the
      // Hadsell paper. New models should probably use this version.
      // legacy_version = true uses (margin - d^2). This is kept to support /
      // reproduce existing models and results
      optional bool legacy_version = 2 [default = false];
    }

6.1 Forward 参数

输入 Input 1 - (N × C × 1 × 1)，特征 ${a \in [-inf, inf]}$

输入 Input2 - (N × C × 1 × 1)，特征 ${b \in [-inf, inf]}$

输入 Input3 - N × 1 × 1 × 1，二值相似度 ${s \in [0, 1]}$

输出 Output 1 - (1 × 1 × 1 × 1)，计算的 contrastive loss 值 E，用于训练 siamese 网络.

6.2 prototxt 定义

    layer {
      name: "loss"
      type: "ContrastiveLoss"
      bottom: "feat"
      bottom: "feat_p"
      bottom: "sim"
      top: "loss"
      contrastive_loss_param {
        margin: 1
      }
    }

From mnist_siamese_train_test.prototxt

7. Accuracy

计算 一对多 分类任务的分类精度.

没有 backward 计算.

    message AccuracyParameter {
      // top_k 精度
      optional uint32 top_k = 1 [default = 1];

      // The "label" axis of the prediction blob, whose argmax corresponds to the
      // predicted label -- may be negative to index from the end (e.g., -1 for the
      // last axis).  For example, if axis == 1 and the predictions are
      // (N x C x H x W), the label blob is expected to contain N*H*W ground truth
      // labels with integer values in {0, 1, ..., C-1}.
      optional int32 axis = 2 [default = 1];

      // 精度计算，忽略 ignore_label 
      optional int32 ignore_label = 3;
    }

7.1 参数

AccuracyLayer 提供了 AccuracyParameter accuracy_param 参数选项：

op_k - 可选，默认为 1. 选取最大的 k 个预测值为正确预测. 如，k=5 表示，如果 groundtruth label 在 top 5 的预测 labels 内，则认为是预测正确.

Reference

[1] - 交叉熵代价函数(损失函数)及其求导推导

[2] - caffe层解读系列——hinge_loss

[3] - 损失函数改进方法总览

[4] - 视觉分类任务中处理不平衡问题的loss比较

[5] - caffe Namespace Reference

[6] - 机器学习中的损失函数（着重比较：hinge loss vs softmax loss）

1. SoftmaxWithLoss

1.1 Forward 参数

1.2 Backward 参数

1.3 prototxt 定义

2. EuclideanLoss

2.1 Forward 参数

2.2 Backward 参数

2.3 prototxt 定义

3. MultinomialLogisticLoss

3.1 Forward 参数

3.2 prototxt 定义

4. InfogainLoss

4.1 Forward 参数

4.2 prototxt 定义

5. HingeLoss

5.1 Forward 参数

5.2 prototxt 定义

6. ContrastiveLoss

6.1 Forward 参数

6.2 prototxt 定义

7. Accuracy

7.1 参数

Reference

※相关文章推荐※

※最新文章推荐※