互信息,MI,Mutual Information,是用于评价相同数据的两个标签之间的相似性度量. 其公式如:

$$ MI(U, V) = \sum_{i=1}^{|U|} \sum _{j=1}^{|V|} \frac{|U_i \cap V_j|}{N} log \frac{N|U_i \cap V_j|}{|U_i||V_j|} $$

其中,$|U_i|$ 是聚类簇 $U_i$ 中的样本数;$|V_j|$ 是聚类簇 $V_j$ 中的样本数.

MI 是与标签的绝对值无关的:类别或聚类簇标签值的排列方式不会改变 MI 结果.

MI 还具有对称性.

MI 常用的两种形式为,归一化互信息(NMI, Normalized Mutual Information) 和可调整互信息(AMI,Adjusted Mutual Information). 其中,NMI 在论文中更为常用.

1. NMI - sklearn

sklearn.metrics.normalized_mutual_info_score

from sklearn.metrics.cluster import normalized_mutual_info_score

#
c1 = [0, 0, 1, 1]
c2 = [0, 0, 1, 1]
nmi = normalized_mutual_info_score(c1, c2)
print('[INFO]NMI: ', nmi)
# 1.0 

2. AMI - sklearn

sklearn.metrics.adjusted_mutual_info_score

$$ AMI(U, V) = \frac{MI(U, V) - E(MI(U, V))}{avg(H(U), H(V)) - E(MI(U, V))} $$

from sklearn.metrics.cluster import adjusted_mutual_info_score

#
c1 = [0, 0, 1, 1]
c2 = [0, 0, 1, 1]
ami = adjusted_mutual_info_score(c1, c2)
print('[INFO]AMI: ', ami)
# 1.0 
Last modification:April 30th, 2021 at 04:12 pm