机器学习 - 精度评价 分类任务的混淆矩阵、平均分类精度、每类分类精度、总体分类精度、F1-score 等

Python sklearn.metrics 提供了很多任务的评价指标,如分类任务的混淆矩阵、平均分类精度、每类分类精度、总体分类精度、F1-score 等;以及回归任务、聚类任务等多种内置函数.

<h2>1. 分类 - 混淆矩阵 Confusion Matrix</h2>

sklearn.metrics.confusion_matrix

from sklearn.metrics import confusion_matrix

计算混淆矩阵,以估计分类精度.

记混淆矩阵 ${ C }$,混淆矩阵元素 ${ C_{ij} }$ 为 gt_label=i , pred_label=j 的元素个数,i,j 为类别 labels.

二值分类中, true negatives 数为 ${ C_{0,0} }$,false negatives 数为 ${ C_{1,0} }$,true positives 数为 ${ C_{1,1} }$,false negatives 数为 ${ C_{0,1} }$.

用法:

C = confusion_matrix(gt_labels, pred_labels, labels=None, sample_weight=None)[source]
# C 为 n_classes x n_classes 的混淆矩阵
  • gt_labels - Groundtruth label 值
  • pred_labels - 分类器预测的 label 值
  • labels - labels 列表,用于索引混淆矩阵

示例1:

from sklearn.metrics import confusion_matrix
gt_labels = [2, 0, 2, 2, 0, 1]
pred_labels = [0, 0, 2, 2, 0, 2]
confusion_matrix(gt_labels, pred_labels)
# array([[2, 0, 0],
#        [0, 0, 1],
#        [1, 0, 2]])

示例2:

from sklearn.metrics import confusion_matrix
gt_labels = ["cat", "ant", "cat", "cat", "ant", "bird"]
pred_labels = ["ant", "ant", "cat", "cat", "ant", "cat"]
confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])
# array([[2, 0, 0],
#        [0, 0, 1],
#        [1, 0, 2]])

示例3:

二值分类情况,

from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
#(tn, fp, fn, tp)
#(0, 2, 1, 1)
Last modification:October 10th, 2018 at 04:16 pm