[原文 - Caffe custom sigmoid cross entropy loss layer].
SigmoidCrossEntropyLoss 的详细推导过程
基于 Python 实现 Caffe Layer

很清晰的一篇介绍,学习下.

1. Sigmoid Cross Entropy Loss 推导

Sigmoid Cross Entropy Loss 定义形式:
${L = tln(P) + (1-t)ln(1-P) }$

其中,

  • t - target 或 label;
  • P - Sigmoid Score,

${P = \frac{1}{1+ e^{-x}} }$

有:

${L = tln(\frac{1}{1 + e^{-x}}) + (1-t)ln(1- \frac{1}{1 + e^{-x}})}$

公式推导有:

${L = tln(\frac{1}{1 + e^{-x}}) + (1-t)ln(\frac{e^{-x}}{1 + e^{-x}})}$

${L = tln(\frac{1}{1 + e^{-x}}) + ln(\frac{e^{-x}}{1 + e^{-x}}) -tln(\frac{e^{-x}}{1 + e^{-x}})}$

${L = t[ln1-ln(1 + e^{-x})] + [ln(e^{-1})-ln(1+e^{-x})]-t[ln(e^{-x})-ln(1 + e^{-x})]}$

${L = [-tln(1 + e^{-x})] + ln(e^{-x})- ln(1 + e^{-x})-tln(e^{-x}) + [tln(1 + e^{-x})]}$

合并相关项:

${L = ln(e^{-x})- ln(1 + e^{-x})- tln(e^{-x})}$

${L = -x ln(e)- ln(1 + e^{-x}) + txln(e)}$

${L = -x- ln(1 + e^{-x}) + xt}$

即:

${L = xt- x- ln(1 + e^{-x}) }$ <1>

$e^{-x}$(左) 和 $e^{x}$(右) 的函数特点:

${e^{-x}}$ 随着 ${x}$ 值的增加而减小,当 ${x}$ 值为较大的负值时,${e^{-x}}$ 值变得非常大,很容易引起溢出(overflow). 也就是说,函数需要避免出现这种数据类型.

因此,为了避免溢出,对损失函数 L 进行改动. 即,当 ${x < 0}$ 时,采用 ${e^x}$ 进行修改损失函数:

原损失函数: ${L = xt- x- ln(1 + e^{-x})}$ <1>

有: ${L = xt- x + ln(\frac{1}{1 + e^{-x} })}$

最后一项乘以 ${e^x}$:

${L = xt- x + ln(\frac{1 * e^x}{(1 + e^{-x}) * e^x})}$

${L = xt- x + ln(\frac{e^x}{1 + e^x})}$

${L = xt- x + [ln(e^x)- ln(1 + e^x)]}$

${L = xt- x + xlne- ln(1 + e^x)}$

有:

${L = xt- ln(1 + e^x)}$ <2>

根据 <1> 和 <2>,可以得到最终的损失函数:

${L = xt- x- ln(1 + e^{-x}), (x > 0)}$

${L = xt- 0- ln(1 + e^{x}), (x < 0)}$

合二为一,有:

${L = xt- max(x, 0)- ln(1 + e^{-|x|}), for\ all\ x}$

2. Sigmoid Cross Entropy Loss 求导计算

当 ${x > 0}$ 时,${L = xt- x- ln(1 + e^{-x})}$,

有:

${ \frac{\partial L}{\partial x} = \frac{\partial (xt- x- ln(1 + e^{-x}))}{\partial x} }$

${ \frac{\partial L}{\partial x} = \frac{\partial xt}{\partial x}- \frac{\partial x}{\partial x}- \frac{\partial (ln(1 + e ^ {-x}))}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- 1- \frac{1}{1 + e^{-x}} * \frac{\partial (1 + e^{-x})}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- 1- \frac{1}{1 + e^{-x}} * \frac{\partial (e^{-x})}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- 1 + \frac{e^{-x}}{1 + e^{-x}} }$

有:

${ \frac{\partial L}{\partial x} = t- \frac{1}{1 + e^{-x}} }$

第二项为 Sigmoid 函数 ${ P = \frac{1}{1+ e^{-x}} }$,故,

${ \frac{\partial L}{\partial x} = t- P }$

当 ${ x < 0 }$ 时,${ L = xt- ln(1 + e^{x}) }$,

${ \frac{\partial L}{\partial x} = \frac{\partial (xt- ln(1 + e^{x}))}{\partial x} }$

${ \frac{\partial L}{\partial x} = \frac{\partial xt}{\partial x}- \frac{\partial (ln(1 + e^x))}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- \frac{1}{1 + e^x} * \frac{\partial (e^x)}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- \frac{e^x}{1 + e^x} }$

${ \frac{\partial L}{\partial x} = t- \frac{e^x * e^{-x}}{(1 + e^x)(e^{-x})} }$

${ \frac{\partial L}{\partial x} = t- \frac{1}{1 + e^{-x}} }$

第二项为 Sigmoid 函数 ${ P = \frac{1}{1+ e^{-x}} }$,故,

${ \frac{\partial L}{\partial x} = t- P }$

可以看出,对于 ${x > 0}$ 和 ${x < 0}$,其求导的结果是一样的,都是 target 值与 Sigmoid 值的差值.

3. 基于 Python 定制 caffe loss layer

Caffe 官方给出了基于 Python 定制 EuclideanLossLayer 的 Demo.

这里,根据上面的公式推导,创建基于 Python 的 Caffe SigmoidCrossEntropyLossLayer.
Caffe 自带的是 C++ 实现 - SigmoidCrossEntropyLossLayer,可见 Caffe Loss层 - SigmoidCrossEntropyLossLayer.

假设 ${ Labels \in [0, 1] }$.

3.1 SigmoidCrossEntropyLossLayer 实现

    import caffe
    import scipy

    class CustomSigmoidCrossEntropyLossLayer(caffe.Layer):

        def setup(self, bottom, top):
            # check for all inputs
            if len(bottom) != 2:
                raise Exception("Need two inputs (scores and labels) to compute sigmoid crossentropy loss.")

        def reshape(self, bottom, top):
            # check input dimensions match between the scores and labels
            if bottom[0].count != bottom[1].count:
                raise Exception("Inputs must have the same dimension.")
            # difference would be the same shape as any input
            self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
            # layer output would be an averaged scalar loss
            top[0].reshape(1)

        def forward(self, bottom, top):
            score=bottom[0].data
            label=bottom[1].data

            first_term=np.maximum(score,0)
            second_term=-1*score*label
            third_term=np.log(1+np.exp(-1*np.absolute(score)))

            top[0].data[...]=np.sum(first_term+second_term+third_term)
            sig=scipy.special.expit(score)
            self.diff=(sig-label)
            if np.isnan(top[0].data):
                    exit()

        def backward(self, top, propagate_down, bottom):
            bottom[0].diff[...]=self.diff

3.2 prototxt 中定义

    layer {
      type: 'Python'
      name: 'loss'
      top: 'loss_opt'
      bottom: 'score'
      bottom: 'label'
      python_param {
        # the module name -- usually the filename -- that needs to be in $PYTHONPATH
        module: 'loss_layers'
        # the layer name -- the class name in the module
        layer: 'CustomSigmoidCrossEntropyLossLayer'
      }
      include {
            phase: TRAIN
      }
      # set loss weight so Caffe knows this is a loss layer.
      # since PythonLayer inherits directly from Layer, this isn't automatically
      # known to Caffe
      loss_weight: 1
    }

4. Related

[1] - CaffeLoss - SigmoidCrossEntropyLossLayer

Last modification:October 10th, 2018 at 03:58 pm