从39个Kaggle比赛中总结的语义分割建议和技巧[译]

博主： AIHGF
发布时间：2020 年 05 月 13 日
2781 次浏览
暂无评论
29214字数
分类：语义分割

原文：Image Segmentation: Tips and Tricks from 39 Kaggle Competitions - 2020.04.07
作者：Derrick Mwiti
半译半转，学习下.

作者参加了超过 39 个 Kaggle 比赛后进行的总结，如：

Data Science Bowl 2017 – $1,000,000
Intel & MobileODT Cervical Cancer Screening – $100,000
2018 Data Science Bowl – $100,000
Airbus Ship Detection Challenge – $60,000
Planet: Understanding the Amazon from Space – $60,000
APTOS 2019 Blindness Detection – $50,000
Human Protein Atlas Image Classification – $37,000
SIIM-ACR Pneumothorax Segmentation – $30,000
Inclusive Images Challenge – $25,000

1. 使用外在数据

External Data

[1] - 使用 LUng Node Analysis Grand Challenge 数据，其包含了 radiologists 的详细标注数据.

[2] - 使用 LIDC-IDRI 数据，其包含了每个肿瘤的 radiologist 描述数据.

[3] - 使用 Flickr CC, Wikipedia Commons datasets 数据.

[4] - 使用 Human Protein Atlas Dataset 数据.

[5] - 使用 IDRiD 数据.

2. 数据探索和洞察

Data Exploration and Gaining insights

[1] - Clustering of 3d segmentation with the 0.5 threshold

[2] - 确认训练和测试数据集的标签分布是否存在巨大差异. Identify if there is a substantial difference in train/test label distributions

3. 预处理

[1] - 采用 Difference of Gaussian (DoG) 算法进行斑点检测(blob Detection). 使用 skimage 库的函数实现.

[2] - 使用 patch-based inputs for training，以减少训练时间.

[3] - 使用 cudf 进行数据加载，而不是 Pandas，其具有更快的读取.

[4] - 确保所有的图片具有相同的朝向(Ensure that all the images have the same orientation).

[5] - 使用有限对比度自适应直方图均衡化(Apply contrast limited adaptive histogram equalization).

[6] - 使用 OpenCV 进行所有的常规图像处理操作.

[7] - 使用主动学习(automatic active learning) 并添加手工标注(Employ automatic active learning and adding manual annotations).

[8] - 将所有的图像调整为相同尺寸(Resize all images to the same resolution in order to apply the same model to scans of different thicknesses).

[9] - 将扫描图像转换为归一化的 3D numpy 数组(Convert scan images into normalized 3D numpy arrays).

[10] - 利用暗通道先验进行单图像去雾(Apply single Image Haze Removal using Dark Channel Prior).

[11] - 将所有数据转换为 Hounsfield 单元(Convert all data to Hounsfield units).

[12] - 查找重复图像(Find duplicate images using pair-wise correlation on RGBY).

[13] - 开发类别更均衡的采样器(Make labels more balanced by (https://www.sebastiansylvan.com/post/importancesampling/)developing a sampler).

[14] - 对测试数据采用伪标签(Apply pseudo labeling to test data in order to improve score).

[15] - 缩放图像/标注masks 尺寸到 320x480(Scale down images/masks to 320×480).

[16] - 采用 32x32 的核进行直方图均衡化(Histogram equalization (CLAHE) with kernel size 32×32).

[17] - 将 DCM 转化为 PNG(Convert DCM to PNG).

[18] - 计算每张图像的 md5 去重(Calculate the md5 hash for each image when there are duplicate images).

4. 数据增强

[1] - 使用 albumentations 库进行图像增强.

[2] - 随机旋转 90 度(Apply random rotation by 90 degrees).

[3] - 水平、垂直、水平垂直翻转(Use horizontal, vertical or both flips).

[4] - 深度几何变换(Attempt heavy geometric transformations: Elastic Transform, PerspectiveTransform, Piecewise Affine transforms, pincushion distortion).

[5] - 随机HSV(Apply random HSV).

[6] - 采用 loss-less 增强以仅泛化，避免有用信息的 loss 损失(Use of loss-less augmentation for generalization to prevent loss of useful image information).

[7] - 通道乱序(Apply channel shuffling).

[8] - 基于类别频率进行数据增强(Do data augmentation based on class frequency).

[9] - 使用高斯噪声(Apply gaussian noise).

[10] - Use lossless permutations of 3D images for data augmentation.

[11] - 随机旋转一个[0, 45] 间角度(Rotate by a random angle from 0 to 45 degrees).

[12] - 随机缩放一个 [0.8, 12] 间的尺度(Scale by a random factor from 0.8 to 1.2).

[13] - 改变光照(Brightness changing).

[14] - 随机改变HSV值(Randomly change hue, saturation and value).

[15] - Apply D4 augmentations.

[16] - 有限对比度自适应直方图均衡化(Contrast limited adaptive histogram equalization).

[17] - 自动增强策略(Use the AutoAugment augmentation strategy).

5. 模型

5.1. 网络结构

[1] - 使用 UNet 类网络结构(Use of a U-net based architecture. Adopted the concepts and applied them to 3D input tensors)

[2] - The inception-ResNet v2 architecture for training features with different receptive fields

[3] - Siamese networks with adversarial training

[4] - ResNet50, Xception, Inception ResNet v2 x 5 with Dense (FC) layer as the final layer

[5] - Use of a global max-pooling layer which returns a fixed-length output no matter the input size

[6] - Use of stacked dilated convolutions

[7] - VoxelNet

[8] - Replace plus sign in LinkNet skip connections with concat and conv1x1

[9] - Generalized mean pooling

[10] - Keras NASNetLarge to train the model from scratch using 224x224x3

[11] - Use of the 3D convnet to slide over the images

[12] - Imagenet-pre-trained ResNet152 as the feature extractor

[13] - Replace the final fully-connected layers of ResNet by 3 fully connected layers with dropout

[14] - Use ConvTranspose in the decoder

[15] - Applying the VGG baseline architecture

[16] - Implementing the C3D network with adjusted receptive fields and a 64 unit bottleneck layer on the end of the network

[17] - Use of UNet type architectures with pre-trained weights to improve convergence and performance of binary segmentation on 8-bit RGB input images

[18] - LinkNet since it’s fast and memory efficient

[19] - MASKRCNN

[20] - BN-Inception

[21] - Fast Point R-CNN

[22] - Seresnext

[23] - UNet and Deeplabv3

[24] - Faster RCNN

[25] - SENet154

[26] - ResNet152

[27] - NASNet-A-Large

[28] - EfficientNetB4

[29] - ResNet101

[30] - GAPNet

[31] - PNASNet-5-Large

[32] - Densenet121

[33] - AC-GAN

[34] - XceptionNet (96), XceptionNet (299), Inception v3 (139), InceptionResNet v2 (299), DenseNet121 (224)

[35] - AlbuNet (resnet34) from ternausnets

[36] - SpaceNet

[37] - Resnet50 from selim_sef SpaceNet 4

[38] - SCSEUnet (seresnext50) from selim_sef SpaceNet 4

[39] - A custom Unet and Linknet architecture

[40] - FPNetResNet50 (5 folds)

[41] - FPNetResNet101 (5 folds)

[42] - FPNetResNet101 (7 folds with different seeds)

[43] - PANetDilatedResNet34 (4 folds)

[44] - PANetResNet50 (4 folds)

[45] - EMANetResNet101 (2 folds)

[46] - RetinaNet

[47] - Deformable R-FCN

[48] - Deformable Relation Networks

5.2. 硬件平台

[1] - Use of the AWS GPU instance p2.xlarge with a NVIDIA K80 GPU

[2] - Pascal Titan-X GPU

[3] - Use of 8 TITAN X GPUs

[4] - 6 GPUs: 21080Ti + 41080

[5] - Server with 8×NVIDIA Tesla P40, 256 GB RAM and 28 CPU cores

[6] - Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD

[7] - GCP 1x P100, 8x CPU, 15 GB RAM, SSD or 2x P100, 16x CPU, 30 GB RAM

[8] - NVIDIA Tesla P100 GPU with 16GB of RAM

[9] - Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD

[10] - 980Ti GPU, 2600k CPU, and 14GB RAM

5.3. 损失函数

[1] - Dice Coefficient ，对于不平衡数据(imbalanced data)更有效.

[2] - Weighted boundary loss，用于降低预测 segmentation 和 GT 的距离.

[3] - MultiLabelSoftMarginLoss that creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input and target

[4] - Balanced cross entropy (BCE) with logit loss that involves weighing the positive and negative examples by a certain coefficient

[5] - Lovasz that performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses

[6] - FocalLoss + Lovasz obtained by summing the Focal and Lovasz losses

[7] - Arc margin loss that incorporates margin in order to maximise face class separability

[8] - Npairs loss that computes the npairs loss between y_true and y_pred.

[9] - A combination of BCE and Dice loss functions

[10] - LSEP – a pairwise ranking that is is smooth everywhere and thus is easier to optimize

[11] - Center loss that simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers

[12] - Ring Loss that augments standard loss functions such as Softmax

[13] - Hard triplet loss that trains a network to embed features of the same class at the same time maximizing the embedding distance of different classes

[14] - 1 + BCE – Dice that involves subtracting the BCE and DICE losses then adding 1

[15] - Binary cross-entropy – log(dice) that is the binary cross-entropy minus the log of the dice loss

[16] - Combinations of BCE, dice and focal

[17] - Lovasz Loss that loss performs direct optimization of the mean intersection-over-union loss

[18] - BCE + DICE - Dice loss is obtained by calculating smooth dice coefficient function

[19] - Focal loss with Gamma 2 that is an improvement to the standard cross-entropy criterion

[20] - BCE + DICE + Focal – this is basically a summation of the three loss functions

[21] - Active Contour Loss that incorporates the area and size information and integrates the information in a dense deep learning model

[22] - 1024 * BCE(results, masks) + BCE(cls, cls_target)

[23] - Focal + kappa – Kappa is a loss function for multi-class classification of ordinal data in deep learning. In this case we sum it and the focal loss

[24] - ArcFaceLoss — Additive Angular Margin Loss for Deep Face Recognition

[25] - soft Dice trained on positives only – Soft Dice uses predicted probabilities

[26] - 2.7 BCE(pred_mask, gt_mask) + 0.9 DICE(pred_mask, gt_mask) + 0.1 * BCE(pred_empty, gt_empty) which is a custom loss used by the Kaggler

[27] - nn.SmoothL1Loss() that creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise

[28] - Use of the Mean Squared Error objective function in scenarios where it seems to work better than binary-cross entropy objective function.

5.4. 训练技巧

[1] - 尝试不同的学习率(Try different learning rates).

[2] -尝试不同的 batchsizes(Try different batch sizes).

[3] - Use SDG with momentum with manual rate scheduling

[4] - 过多的数据增强会降低精度(Too much augmentation will reduce the accuracy).

[5] - 在裁剪后的图像上训练，但在完整图像上测试(Train on image crops and predict on full images)

[6] - 采用 Keras ReduceLROnPlateau() 来控制学习率( Use of Keras’s ReduceLROnPlateau() to the learning rate).

[7] - 首先不数据增强进行训练，直到损失函数趋平；然后在某些 epochs 采用 soft 和 hard 数据增强(Train without augmentation until plateau then apply soft and hard augmentation to some epochs).

[8] - 冻结除了最后一层的其他网络层；使用 Stage1 的 1000 张图片进行finetune(Freeze all layers except the last one and use 1000 images from Stage1 for tuning

[9] - 开发采样器使标签更均衡(Make labels more balanced by developing a sampler).

[10] - Use of class aware sampling

[11] - Use dropout and augmentation while tuning the last layer

[12] - Pseudo Labeling to improve score

[13] - Use Adam reducing LR on plateau with patience 2–4

[14] - Use Cyclic LR with SGD

[15] - Reduce the learning rate by a factor of two if validation loss does not improve for two consecutive epochs

[16] - Repeat the worst batch out of 10 batches

[17] - Train with default UNET

[18] - Overlap tiles so that each edge pixel is covered twice

[19] - Hyperparameter tuning: learning rate on training, non-maximum suppression and score threshold on inference

[20] - Remove low bounding box with low confidence score

[21] - Train different convolutional neural networks then build an ensemble

[22] - Stop training when the F1 score is decreasing

[23] - Differential learning rate with gradual reducing

[24] - Train ANNs in a stacking way using 5 folds and 30 repeats

[25] - Track of your experiments using Neptune

6. 评测和交叉验证

[1] - Split on non-uniform stratified by classes

[2] - Avoid overfitting by applying cross-validation while tuning the last layer

[3] - 10-fold CV ensemble for classification

[4] - Combination of 5 10-fold CV ensembles for detection

[5] - Sklearn’s stratified K fold function

[6] - 5 KFold Cross-Validation

[7] - Adversarial Validation & Weighting

7. 集成方法

[1] - Use simple majority voting for ensemble

[2] - XGBoost on the max malignancy at 3 zoom levels, the z-location and the amount of strange tissue

[3] - LightGBM for models with too many classes. This was done for raw data features only.

[4] - CatBoost for a second-layer model

[5] - Training with 7 features for the gradient boosting classifier

[6] - Use ‘curriculum learning’ to speed up model training. In this technique, models are first trained on simple samples then progressively moving to hard ones.

[7] - Ensemble with ResNet50, InceptionV3, and InceptionResNetV2

[8] - Ensemble method for object detection

[9] - An ensemble of Mask RCNN, YOLOv3, and Faster RCNN architectures n with a classification network — DenseNet-121 architecture

8. 后处理

[1] - Apply test time augmentation — presenting an image to a model several times with different random transformations and average the predictions you get

[2] - Equalize test prediction probabilities instead of only using predicted classes

[3] - Apply geometric mean to the predictions

[4] - Overlap tiles during inferencing so that each edge pixel is covered at least thrice because UNET tends to have bad predictions around edge areas.

[5] - Non-maximum suppression and bounding box shrinkage

[6] - Watershed post processing to detach objects in instance segmentation problems.

最后修改：2020 年 05 月 13 日

如果觉得我的文章对你有用，请随意赞赏

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

评论 *

私密评论

名称 *

🎲

邮箱 *

地址

从39个Kaggle比赛中总结的语义分割建议和技巧[译]

AIHGF • 2020 年 05 月 13 日

<blockquote>原文：<a class="no-external-link" href="https://neptune.ai/blog/image-segmentation-tips-and-tricks-from-kaggle-competitions?utm_source=reddit&utm_medium=post&utm_campaign=blog-image-segmentation-tips-and-tricks-from-kaggle-competitions" target="_blank">Image Segmentation: Tips and Tricks from 39 Kaggle Competitions - 2020.04.07</a>作者：<a class="no-external-link" href="https://www.linkedin.com/in/mwitiderrick/" target="_blank">Derrick Mwiti</a>半译半转，学习下.</blockquote>作者参加了超过 39 个 Kaggle 比赛后进行的总结，如：<ul><li><a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2017/" target="_blank">Data Science Bowl 2017</a> – $1,000,000</li><li><a class="no-external-link" href="https://www.kaggle.com/c/intel-mobileodt-cervical-cancer-screening" target="_blank">Intel & MobileODT Cervical Cancer Screening</a> – $100,000</li><li><a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2018" target="_blank">2018 Data Science Bowl </a>– $100,000</li><li><a class="no-external-link" href="https://www.kaggle.com/c/airbus-ship-detection" target="_blank">Airbus Ship Detection Challenge</a> – $60,000</li><li><a class="no-external-link" href="https://www.kaggle.com/c/planet-understanding-the-amazon-from-space" target="_blank">Planet: Understanding the Amazon from Space</a> – $60,000</li><li><a class="no-external-link" href="https://www.kaggle.com/c/aptos2019-blindness-detection" target="_blank">APTOS 2019 Blindness Detection</a> – $50,000</li><li><a class="no-external-link" href="https://www.kaggle.com/c/human-protein-atlas-image-classification" target="_blank">Human Protein Atlas Image Classification</a> – $37,000</li><li><a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation" target="_blank">SIIM-ACR Pneumothorax Segmentation</a> – $30,000</li><li><a class="no-external-link" href="https://www.kaggle.com/c/inclusive-images-challenge" target="_blank">Inclusive Images Challenge</a> – $25,000</li></ul><h2>1. 使用外在数据</h2>External Data[1] - 使用 <a class="no-external-link" href="https://luna16.grand-challenge.org/" target="_blank">LUng Node Analysis Grand Challenge</a> 数据，其包含了 radiologists 的详细标注数据.[2] - 使用 <a class="no-external-link" href="https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI" target="_blank">LIDC-IDRI</a> 数据，其包含了每个肿瘤的 radiologist 描述数据.[3] - 使用 <a class="no-external-link" href="https://www.flickr.com/creativecommons/" target="_blank">Flickr CC</a>, <a class="no-external-link" href="https://commons.wikimedia.org/wiki/Publicly_available_global_data_sets" target="_blank">Wikipedia Commons datasets</a> 数据.[4] - 使用 <a class="no-external-link" href="https://www.proteinatlas.org/about/download" target="_blank">Human Protein Atlas Dataset</a> 数据.[5] - 使用 <a class="no-external-link" href="https://www.mdpi.com/2306-5729/3/3/25" target="_blank">IDRiD</a> 数据.<h2>2. 数据探索和洞察</h2>Data Exploration and Gaining insights[1] - <a class="no-external-link" href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.2721&rep=rep1&type=pdf" target="_blank">Clustering of 3d segmentation</a> with the 0.5 threshold[2] - 确认训练和测试数据集的标签分布是否存在巨大差异. Identify if there is a <a class="no-external-link" href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/72450#433005" target="_blank">substantial difference in train/test label distributions</a><h2>3. 预处理</h2>[1] - 采用 <a class="no-external-link" href="https://en.wikipedia.org/wiki/Blob_detection#The_difference_of_Gaussians_approach" target="_blank">Difference of Gaussian (DoG)</a> 算法进行斑点检测(blob Detection). 使用 <a class="no-external-link" href="https://scikit-image.org/" target="_blank">skimage</a> 库的函数实现.[2] - 使用 <a class="no-external-link" href="https://www.mdpi.com/2072-4292/11/2/114/pdf-vor" target="_blank">patch-based inputs for training</a>，以减少训练时间.[3] - 使用 <a class="no-external-link" href="https://github.com/rapidsai/cudf" target="_blank">cudf</a> 进行数据加载，而不是 <a class="no-external-link" href="https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library-f1b678f34673" target="_blank">Pandas</a>，其具有更快的读取. [4] - 确保所有的图片具有相同的朝向(Ensure that all the images have the<a class="no-external-link" href="https://github.com/albumentations-team/albumentations" target="_blank"> same orientation</a>).[5] - 使用有限对比度自适应直方图均衡化(Apply contrast limited adaptive <a class="no-external-link" href="https://towardsdatascience.com/histogram-equalization-5d1013626e64" target="_blank">histogram equalization</a>).[6] - 使用 OpenCV 进行所有的常规图像处理操作.[7] - 使用主动学习(<a class="no-external-link" href="https://towardsdatascience.com/review-suggestive-annotation-deep-active-learning-framework-biomedical-image-segmentation-e08e4b931ea6" target="_blank">automatic active learning</a>) 并添加手工标注(Employ <a class="no-external-link" href="https://towardsdatascience.com/review-suggestive-annotation-deep-active-learning-framework-biomedical-image-segmentation-e08e4b931ea6" target="_blank">automatic active learning</a> and adding manual annotations).[8] - 将所有的图像调整为相同尺寸(<a class="no-external-link" href="https://github.com/albumentations-team/albumentations" target="_blank">Resize all images to the same resolution</a> in order to apply the same model to scans of different thicknesses).[9] - 将扫描图像转换为归一化的 3D numpy 数组(<a class="no-external-link" href="https://github.com/albumentations-team/albumentations" target="_blank">Convert scan images</a> into normalized 3D numpy arrays). [10] - 利用暗通道先验进行单图像去雾(Apply single<a class="no-external-link" href="http://kaiminghe.com/" target="_blank"> Image Haze Removal</a> using Dark Channel Prior).[11] - 将所有数据转换为 Hounsfield 单元(Convert all data to <a class="no-external-link" href="https://www.ncbi.nlm.nih.gov/books/NBK547721/" target="_blank">Hounsfield units</a>).[12] - 查找重复图像(Find duplicate images using <a class="no-external-link" href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77269#583768" target="_blank">pair-wise correlation on RGBY</a>).[13] - 开发类别更均衡的采样器(Make labels more balanced by (<a class="no-external-link" href="https://www.sebastiansylvan.com/post/importancesampling/" target="_blank">https://www.sebastiansylvan.com/post/importancesampling/</a>)<a class="no-external-link" href="https://www.sebastiansylvan.com/post/importancesampling/" target="_blank">developing a sampler</a>).[14] - 对测试数据采用伪标签(Apply p<a class="no-external-link" href="https://towardsdatascience.com/pseudo-labeling-to-deal-with-small-datasets-what-why-how-fd6f903213af" target="_blank">seudo labeling </a>to test data in order <a class="no-external-link" href="https://www.analyticsvidhya.com/blog/2017/09/pseudo-labelling-semi-supervised-learning-technique/" target="_blank">to</a> <a class="no-external-link" href="https://arxiv.org/abs/1908.02983" target="_blank">improve score</a>).[15] - 缩放图像/标注masks 尺寸到 320x480(<a class="no-external-link" href="https://github.com/albumentations-team/albumentations" target="_blank">Scale down images/masks to 320×480</a>).[16] - 采用 32x32 的核进行直方图均衡化(<a class="no-external-link" href="https://towardsdatascience.com/histogram-equalization-5d1013626e64" target="_blank">Histogram equalization</a> (CLAHE) with kernel size 32×32).[17] - 将 DCM 转化为 PNG(Convert <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/97120#560788" target="_blank">DCM to PNG</a>).[18] - 计算每张图像的 md5 去重(Calculate the <a class="no-external-link" href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45798" target="_blank">md5 hash for each image</a> when there are duplicate images).<h2>4. 数据增强</h2>[1] - 使用 <a class="no-external-link" href="https://github.com/albu/albumentations" target="_blank">albumentations</a> 库进行图像增强. [2] - 随机旋转 90 度(Apply random<a class="no-external-link" href="https://github.com/albu/albumentations" target="_blank"> rotation by 90 degrees</a>).[3] - 水平、垂直、水平垂直翻转(Use h<a class="no-external-link" href="https://github.com/albu/albumentations" target="_blank">orizontal, vertical or both flips</a>).[4] - 深度几何变换(Attempt <a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741#477226" target="_blank">heavy geometric transformations</a>: Elastic Transform, PerspectiveTransform, Piecewise Affine transforms, pincushion distortion).[5] - 随机HSV(Apply <a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741#477226" target="_blank">random HSV</a>).[6] - 采用 loss-less 增强以仅泛化，避免有用信息的 loss 损失(Use of <a class="no-external-link" href="http://juliandewit.github.io/kaggle-ndsb2017/" target="_blank">loss-less augmentation </a>for generalization to prevent loss of useful image information).[7] - 通道乱序(Apply <a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741#477226" target="_blank">channel shuffling</a>).[8] - 基于类别频率进行数据增强(Do <a class="no-external-link" href="https://www.kdnuggets.com/2018/05/data-augmentation-deep-learning-limited-data.html" target="_blank">data augmentation</a> based on class <a class="no-external-link" href="https://towardsdatascience.com/deep-learning-unbalanced-training-data-solve-it-like-this-6c528e9efea6" target="_blank">frequency</a>).[9] - 使用高斯噪声(Apply <a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741#477226" target="_blank">gaussian noise</a>).[10] - Use <a class="no-external-link" href="https://en.wikipedia.org/wiki/Octahedral_symmetry#The_isometries_of_the_cube" target="_blank">lossless permutations of 3D images</a> for data augmentation.[11] - 随机旋转一个[0, 45] 间角度(<a class="no-external-link" href="https://github.com/albumentations-team/albumentations" target="_blank">Rotate</a> by a random angle from 0 to 45 degrees).[12] - 随机缩放一个 [0.8, 12] 间的尺度(<a class="no-external-link" href="https://github.com/albumentations-team/albumentations" target="_blank">Scale</a> by a random factor from 0.8 to 1.2).[13] - 改变光照(<a class="no-external-link" href="https://github.com/albumentations-team/albumentations" target="_blank">Brightness</a> changing).[14] - 随机改变HSV值(Randomly change <a class="no-external-link" href="https://github.com/albumentations-team/albumentations" target="_blank">hue, saturation and value</a>).[15] - Apply <a class="no-external-link" href="https://en.wikipedia.org/wiki/Dihedral_group" target="_blank">D4</a> augmentations. [16] - 有限对比度自适应直方图均衡化(Contrast limited adaptive <a class="no-external-link" href="https://towardsdatascience.com/histogram-equalization-5d1013626e64" target="_blank">histogram equalization</a>).[17] - 自动增强策略(Use the <a class="no-external-link" href="https://arxiv.org/pdf/1805.09501.pdf" target="_blank">AutoAugment</a> augmentation strategy).<h2>5. 模型</h2><h3>5.1. 网络结构</h3>[1] - 使用 UNet 类网络结构(Use of a <a class="no-external-link" href="https://arxiv.org/abs/1505.04597" target="_blank">U-net</a> based architecture. Adopted the concepts and applied them to 3D input tensors)[2] - The <a class="no-external-link" href="https://research.googleblog.com/2016/08/improving-inception-and-image.html" target="_blank">inception-ResNet v2 architecture</a> for training features with different receptive fields[3] - <a class="no-external-link" href="https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf" target="_blank">Siamese networks</a> with adversarial training[4] - <a class="no-external-link" href="https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33" target="_blank">ResNet50</a>, <a class="no-external-link" href="https://arxiv.org/abs/1610.02357" target="_blank">Xception</a>, <a class="no-external-link" href="https://arxiv.org/abs/1602.07261" target="_blank">Inception ResNet </a>v2 x 5 with Dense (FC) layer as the final layer[5] - Use of a <a class="no-external-link" href="https://keras.io/layers/pooling/" target="_blank">global max-pooling layer</a> which returns a fixed-length output no matter the input size[6] - Use of <a class="no-external-link" href="https://arxiv.org/abs/1904.03076" target="_blank">stacked dilated convolutions</a>[7] - <a class="no-external-link" href="https://arxiv.org/abs/1711.06396" target="_blank">VoxelNet</a>[8] - Replace plus sign in <a class="no-external-link" href="https://arxiv.org/abs/1707.03718" target="_blank">LinkNet</a> <a class="no-external-link" href="https://arxiv.org/pdf/1912.05074.pdf" target="_blank">skip connections </a>with concat and conv1x1[9] - <a class="no-external-link" href="https://arxiv.org/pdf/1711.02512.pdf" target="_blank">Generalized mean pooling</a>[10] - Keras <a class="no-external-link" href="https://www.tensorflow.org/api_docs/python/tf/keras/applications/NASNetLarge" target="_blank">NASNetLarge</a> to train the model from scratch using 224x224x3[11] - Use of the <a class="no-external-link" href="https://towardsdatascience.com/understanding-1d-and-3d-convolution-neural-network-keras-9d8f76e29610" target="_blank">3D convnet</a> to slide over the images[12] - Imagenet-pre-trained <a class="no-external-link" href="https://towardsdatascience.com/review-resnet-winner-of-ilsvrc-2015-image-classification-localization-detection-e39402bfa5d8" target="_blank">ResNet152</a> as the feature extractor[13] - <a class="no-external-link" href="https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/discussion/36887#207397" target="_blank">Replace the final fully-connected layers of ResNet by 3 fully connected layers with dropout</a>[14] - Use <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107518#619543" target="_blank">ConvTranspose</a> in the decoder[15] - Applying the <a class="no-external-link" href="https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5" target="_blank">VGG baseline architecture</a>[16] - Implementing the <a class="no-external-link" href="http://vlg.cs.dartmouth.edu/c3d/" target="_blank">C3D</a> network with adjusted receptive fields and a 64 unit bottleneck layer on the end of the network[17] - Use of <a class="no-external-link" href="https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47" target="_blank">UNet</a> type architectures with pre-trained weights to improve convergence and performance of binary segmentation on 8-bit RGB input images[18] - <a class="no-external-link" href="https://arxiv.org/abs/1707.03718" target="_blank">LinkNet</a> since it’s fast and memory efficient[19] - <a class="no-external-link" href="https://github.com/matterport/Mask_RCNN" target="_blank">MASKRCNN</a>[20] - <a class="no-external-link" href="https://github.com/microsoft/CNTK/tree/master/Examples/Image/Classification/GoogLeNet/BN-Inception" target="_blank">BN-Inception</a>[21] - <a class="no-external-link" href="https://arxiv.org/abs/1908.02990" target="_blank">Fast Point R-CNN</a>[22] - <a class="no-external-link" href="https://github.com/osmr/imgclsmob/blob/master/pytorch/pytorchcv/models/seresnext.py" target="_blank">Seresnext</a>[23] - <a class="no-external-link" href="https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47" target="_blank">UNet</a> and <a class="no-external-link" href="https://arxiv.org/abs/1706.05587" target="_blank">Deeplabv3</a>[24] - <a class="no-external-link" href="https://arxiv.org/abs/1506.01497" target="_blank">Faster RCNN</a>[25] - <a class="no-external-link" href="https://paperswithcode.com/paper/squeeze-and-excitation-networks" target="_blank">SENet154</a>[26] - <a class="no-external-link" href="https://www.kaggle.com/pytorch/resnet152" target="_blank">ResNet152</a>[27] - <a class="no-external-link" href="https://arxiv.org/pdf/1707.07012.pdf" target="_blank">NASNet-A-Large</a>[28] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107795#619987" target="_blank">EfficientNetB4</a>[29] - <a class="no-external-link" href="https://www.kaggle.com/pytorch/resnet101" target="_blank">ResNet101</a>[30] - <a class="no-external-link" href="https://www.groundai.com/project/gapnet-graph-attention-based-point-neural-network-for-exploiting-local-feature-of-point-cloud/1" target="_blank">GAPNet</a>[31] - <a class="no-external-link" href="https://arxiv.org/pdf/1712.00559.pdf" target="_blank">PNASNet-5-Large</a>[32] - <a class="no-external-link" href="https://www.kaggle.com/pytorch/densenet121" target="_blank">Densenet121</a>[33] - <a class="no-external-link" href="https://machinelearningmastery.com/how-to-develop-an-auxiliary-classifier-gan-ac-gan-from-scratch-with-keras/" target="_blank">AC-GAN</a>[34] - <a class="no-external-link" href="https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/49602#282979" target="_blank">XceptionNet (96), XceptionNet (299), Inception v3 (139), InceptionResNet v2 (299), DenseNet121 (224)</a>[35] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107824#650999" target="_blank">AlbuNet (resnet34)</a> from <a class="no-external-link" href="https://github.com/ternaus/TernausNet" target="_blank">ternausnets</a>[36] - <a class="no-external-link" href="https://medium.com/the-downlinq/a-deep-dive-into-the-spacenet-4-winning-algorithms-8d611a5dfe25" target="_blank">SpaceNet</a>[37] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107824#650999" target="_blank">Resnet50</a> from <a class="no-external-link" href="https://github.com/SpaceNetChallenge/SpaceNet_Off_Nadir_Solutions/tree/master/selim_sef/zoo" target="_blank">selim_sef SpaceNet 4</a>[38] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107824" target="_blank">SCSEUnet (seresnext50)</a> from <a class="no-external-link" href="https://github.com/SpaceNetChallenge/SpaceNet_Off_Nadir_Solutions/tree/master/selim_sef/zoo" target="_blank">selim_sef SpaceNet 4</a>[39] - A custom <a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54835#320935" target="_blank">Unet and Linknet</a><a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54835#320935" target="_blank"> architecture</a>[40] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872" target="_blank">FPNetResNet50 (5 folds)</a>[41] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872" target="_blank">FPNetResNet101 (5 folds)</a>[42] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872" target="_blank">FPNetResNet101 (7 folds with different seeds)</a>[43] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872" target="_blank">PANetDilatedResNet34 (4 folds)</a>[44] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872" target="_blank">PANetResNet50 (4 folds)</a>[45] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107872" target="_blank">EMANetResNet101 (2 folds)</a>[46] - <a class="no-external-link" href="https://github.com/fizyr/keras-retinanet" target="_blank">RetinaNet</a>[47] - <a class="no-external-link" href="https://github.com/msracver/Deformable-ConvNets" target="_blank">Deformable R-FCN</a>[48] - <a class="no-external-link" href="https://github.com/msracver/Relation-Networks-for-Object-Detection" target="_blank">Deformable Relation Networks</a><h3>5.2. 硬件平台</h3>[1] - <a class="no-external-link" href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45724" target="_blank">Use of the AWS GPU instance p2.xlarge with a NVIDIA K80 GPU</a>[2] - <a class="no-external-link" href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45724" target="_blank">Pascal Titan-X GPU</a>[3] - <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40121#226179" target="_blank">Use of 8 TITAN X GPUs</a>[4] - <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40121#226179" target="_blank">6 GPUs: 2</a><a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40121#226179" target="_blank">1080Ti + 4</a><a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40121#226179" target="_blank">1080</a>[5] - <a class="no-external-link" href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45724" target="_blank">Server with 8×NVIDIA Tesla P40, 256 GB RAM and 28 CPU cores</a>[6] - <a class="no-external-link" href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45850" target="_blank">Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD</a>[7] - <a class="no-external-link" href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/72450" target="_blank">GCP 1x P100, 8x CPU, 15 GB RAM, SSD or 2x P100, 16x CPU, 30 GB RAM</a>[8] - <a class="no-external-link" href="https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/49602" target="_blank">NVIDIA Tesla P100 GPU with 16GB of RAM</a>[9] - <a class="no-external-link" href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45850" target="_blank">Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD</a>[10] - <a class="no-external-link" href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77325" target="_blank">980Ti GPU, 2600k CPU, and 14GB RAM</a><h3>5.3. 损失函数</h3>[1] - <a class="no-external-link" href="https://towardsdatascience.com/metrics-to-evaluate-your-semantic-segmentation-model-6bcb99639aa2" target="_blank">Dice Coefficient </a>，对于不平衡数据(imbalanced data)更有效.[2] - <a class="no-external-link" href="https://www.kaggle.com/lyakaap/weighing-boundary-pixels-loss-script-by-keras2" target="_blank">Weighted boundary loss</a>，用于降低预测 segmentation 和 GT 的距离.[3] - <a class="no-external-link" href="https://pytorch.org/docs/stable/nn.html?highlight=multilabelsoftmarginloss#torch.nn.MultiLabelSoftMarginLoss" target="_blank">MultiLabelSoftMarginLoss</a> that creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input and target[4] - Balanced cross entropy (BCE)<a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/101429" target="_blank"> with logit loss</a> that involves weighing the positive and negative examples by a certain coefficient[5] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107981" target="_blank">Lovasz</a> that performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses[6] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107687" target="_blank">FocalLoss + Lovasz</a> obtained by summing the Focal and Lovasz losses[7] - <a class="no-external-link" href="https://arxiv.org/abs/1801.07698" target="_blank">Arc margin loss</a> that incorporates margin in order to maximise face class separability[8] - <a class="no-external-link" href="https://www.tensorflow.org/addons/api_docs/python/tfa/losses/npairs_loss" target="_blank">Npairs loss</a> that computes the npairs loss between y_true and y_pred.[9] - A combination of <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40199" target="_blank">BCE and Dice loss</a> functions[10] - <a class="no-external-link" href="https://arxiv.org/pdf/1704.03135.pdf" target="_blank">LSEP</a> – a pairwise ranking that is is smooth everywhere and thus is easier to optimize[11] - <a class="no-external-link" href="https://ydwen.github.io/papers/WenECCV16.pdf" target="_blank">Center loss</a> that simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers[12] - <a class="no-external-link" href="https://arxiv.org/abs/1803.00130" target="_blank">Ring Loss</a> that augments standard loss functions such as Softmax[13] - <a class="no-external-link" href="https://www.tensorflow.org/addons/tutorials/losses_triplet" target="_blank">Hard triplet loss</a> that trains a network to embed features of the same class at the same time maximizing the embedding distance of different classes[14] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107687" target="_blank">1 + BCE – Dice</a> that involves subtracting the BCE and DICE losses then adding 1[15] - <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40144" target="_blank">Binary cross-entropy –  log(dice)</a> that is the binary cross-entropy minus the log of the dice loss[16] - <a class="no-external-link" href="https://github.com/SpaceNetChallenge/SpaceNet_Off_Nadir_Solutions/blob/master/selim_sef/training/losses.py" target="_blank">Combinations</a> of BCE, dice and focal[17] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107981" target="_blank">Lovasz Loss</a> that loss performs direct optimization of the mean intersection-over-union loss[18] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107795#619987" target="_blank">BCE + DICE</a> - Dice loss is obtained by calculating smooth dice coefficient function[19] - <a class="no-external-link" href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77320" target="_blank">Focal loss with Gamma 2</a> that is an improvement to the standard cross-entropy criterion[20] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107546" target="_blank">BCE + DICE + Focal</a> – this is basically a summation of the three loss functions[21] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107981" target="_blank">Active Contour Loss</a> that incorporates the area and size information and integrates the information in a dense deep learning model[22] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107603" target="_blank">1024 * BCE(results, masks) + BCE(cls, cls_target)</a>[23] - <a class="no-external-link" href="https://www.kaggle.com/c/aptos2019-blindness-detection/discussion/108058" target="_blank">Focal + kappa</a> – Kappa is a loss function for multi-class classification of ordinal data in deep learning. In this case we sum it and the focal loss[24] - <a class="no-external-link" href="https://arxiv.org/pdf/1801.07698v1.pdf" target="_blank">ArcFaceLoss</a> — Additive Angular Margin Loss for Deep Face Recognition[25] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107522" target="_blank">soft Dice trained on positives only</a> – Soft Dice uses predicted probabilities[26] - <a class="no-external-link" href="https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/108397" target="_blank">2.7 BCE(pred_mask, gt_mask) + 0.9 DICE(pred_mask, gt_mask) + 0.1 * BCE(pred_empty, gt_empty)</a> which is a custom loss used by the Kaggler[27] - <a class="no-external-link" href="https://www.kaggle.com/c/aptos2019-blindness-detection/discussion/108065" target="_blank">nn.SmoothL1Loss()</a> that creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise[28] - Use of the <a class="no-external-link" href="https://towardsdatascience.com/why-using-mean-squared-error-mse-cost-function-for-binary-classification-is-a-bad-idea-933089e90df7" target="_blank">Mean Squared Error objective function</a> in scenarios where it seems to work better than <a class="no-external-link" href="https://machinelearningmastery.com/cross-entropy-for-machine-learning/" target="_blank">binary-cross entropy objective function</a>.<h3>5.4. 训练技巧</h3>[1] - 尝试不同的学习率(<a class="no-external-link" href="https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/" target="_blank">Try different learning rates</a>). [2] -尝试不同的 batchsizes(<a class="no-external-link" href="https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/" target="_blank">Try different batch sizes</a>).[3] - Use <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/38125#213920" target="_blank">SDG with momentum with manual rate scheduling</a>[4] - 过多的数据增强会降低精度(Too much <a class="no-external-link" href="https://medium.com/secure-and-private-ai-writing-challenge/data-augmentation-increases-accuracy-of-your-model-but-how-aa1913468722" target="_blank">augmentation</a> will reduce the accuracy).[5] - 在裁剪后的图像上训练，但在完整图像上测试(Train on image <a class="no-external-link" href="https://www.kaggle.com/c/understanding_cloud_organization/discussion/115115" target="_blank">crops and predict</a> on full images)[6] - 采用 Keras <code>ReduceLROnPlateau()</code> 来控制学习率( Use of Keras’s <a class="no-external-link" href="https://www.kaggle.com/c/bengaliai-cv19/discussion/135998" target="_blank">ReduceLROnPlateau()</a> to the learning rate).[7] - 首先不数据增强进行训练，直到损失函数趋平；然后在某些 epochs 采用 soft 和 hard 数据增强(Train <a class="no-external-link" href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/72450" target="_blank">without augmentation until plateau</a> then apply soft and hard augmentation to some epochs).[8] - 冻结除了最后一层的其他网络层；使用 Stage1 的 1000 张图片进行finetune(<a class="no-external-link" href="https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/65763" target="_blank">Freeze all layers except the</a> last one and use 1000 images from <a class="no-external-link" href="https://www.kaggle.com/c/quickdraw-doodle-recognition/discussion/72892" target="_blank">Stage1 for tuning</a>[9] - 开发采样器使标签更均衡(Make labels more balanced by<a class="no-external-link" href="https://www.sebastiansylvan.com/post/importancesampling/" target="_blank"> </a><a class="no-external-link" href="https://www.sebastiansylvan.com/post/importancesampling/" target="_blank">developing a sampler</a>).[10] - Use of <a class="no-external-link" href="https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/49314" target="_blank">class aware sampling</a>[11] - Use dropout and augmentation while tuning the last layer[12] - <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/38298" target="_blank">Pseudo Labeling</a> to improve score[13] - Use <a class="no-external-link" href="https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/49299" target="_blank">Adam reducing LR on plateau with patience 2–4</a>[14] - Use <a class="no-external-link" href="https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/49299" target="_blank">Cyclic LR with SGD</a>[15] - Reduce the <a class="no-external-link" href="https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/" target="_blank">learning rate </a>by a factor of two if validation loss does not improve for two consecutive epochs[16] - Repeat the <a class="no-external-link" href="https://medium.com/kaggle-blog/carvana-image-masking-challenge-1st-place-winners-interview-78fcc5c887a8" target="_blank">worst batch out</a> of 10 batches[17] - <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40126" target="_blank">Train with default UNET</a>[18] - <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40126" target="_blank">Overlap tiles</a> so that each edge pixel is covered twice[19] - <a class="no-external-link" href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/71022" target="_blank">Hyperparameter tuning: learning rate on training, non-maximum suppression and score threshold on inference</a>[20] - <a class="no-external-link" href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/70505" target="_blank">Remove low bounding box</a> with low confidence score[21] - Train different <a class="no-external-link" href="https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/" target="_blank">convolutional neural networks</a> then build an ensemble[22] - Stop <a class="no-external-link" href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77320" target="_blank">training when the F1 score</a> is decreasing[23] - <a class="no-external-link" href="https://blog.slavv.com/differential-learning-rates-59eff5209a4f" target="_blank">Differential learning rate</a> with gradual reducing[24] - Train ANNs in <a class="no-external-link" href="https://www.kaggle.com/c/statoil-iceberg-classifier-challenge/discussion/48207" target="_blank">a stacking way using</a> 5 folds and 30 repeats[25] - Track of your experiments using<a class="no-external-link" href="https://docs.neptune.ml/" target="_blank"> </a><a class="no-external-link" href="https://docs.neptune.ml/" target="_blank">Neptune</a><h2>6. 评测和交叉验证</h2>[1] - Split on <a class="no-external-link" href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/71433" target="_blank">non-uniform stratified</a> by classes[2] - Avoid <a class="no-external-link" href="https://elitedatascience.com/overfitting-in-machine-learning" target="_blank">overfitting</a> by applying <a class="no-external-link" href="https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/" target="_blank">cross-validation</a> while <a class="no-external-link" href="https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/" target="_blank">tuning</a> the last layer[3] - <a class="no-external-link" href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/70421" target="_blank">10-fold CV ensemble for classification</a>[4] - Combination <a class="no-external-link" href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/70421" target="_blank">of 5 10-fold CV</a> ensembles for detection[5] - Sklearn’s<a class="no-external-link" href="http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html" target="_blank"> stratified K fold function</a>[6] - 5 <a class="no-external-link" href="https://machinelearningmastery.com/k-fold-cross-validation/" target="_blank">KFold Cross-Validation</a>[7] - Adversarial <a class="no-external-link" href="https://www.kaggle.com/c/PLAsTiCC-2018/discussion/75011" target="_blank">Validation & Weighting</a><h2>7. 集成方法</h2>[1] - Use simple <a class="no-external-link" href="https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77256" target="_blank">majority voting</a> for ensemble[2] - <a class="no-external-link" href="https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost/" target="_blank">XGBoost</a> on the <a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2017/discussion/31551" target="_blank">max malignancy at 3 zoom levels,</a> the z-location and the <a class="no-external-link" href="https://www.kaggle.com/c/statoil-iceberg-classifier-challenge/discussion/48207" target="_blank">amount of strange tissue</a>[3] - <a class="no-external-link" href="https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost/" target="_blank">LightGBM</a> for models <a class="no-external-link" href="https://www.kaggle.com/c/quickdraw-doodle-recognition/discussion/73738" target="_blank">with too many</a> classes. This was done for raw data features only.[4] - <a class="no-external-link" href="https://www.analyticsvidhya.com/blog/2017/08/catboost-automated-categorical-data/" target="_blank">CatBoost</a> for <a class="no-external-link" href="https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/45733" target="_blank">a second-layer model</a>[5] - Training with 7 features for the <a class="no-external-link" href="https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/" target="_blank">gradient boosting classifier</a>[6] - Use<a class="no-external-link" href="https://arxiv.org/abs/1904.03626" target="_blank"> ‘curriculum learning’</a> to speed up model training. In this technique, models are first trained on simple samples then progressively moving to hard ones.[7] - Ensemble with <a class="no-external-link" href="https://www.kaggle.com/c/inclusive-images-challenge/discussion/72450" target="_blank">ResNet50, InceptionV3, and InceptionResNetV2</a>[8] - <a class="no-external-link" href="https://github.com/ahrnbom/ensemble-objdet" target="_blank">Ensemble method</a> for object detection[9] - An ensemble of <a class="no-external-link" href="https://www.analyticsvidhya.com/blog/2019/07/computer-vision-implementing-mask-r-cnn-image-segmentation/" target="_blank">Mask RCNN</a>, <a class="no-external-link" href="https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/" target="_blank">YOLOv3</a>, and <a class="no-external-link" href="https://towardsdatascience.com/faster-r-cnn-for-object-detection-a-technical-summary-474c5b857b46" target="_blank">Faster RCNN </a>architectures n with a classification network — <a class="no-external-link" href="https://towardsdatascience.com/understanding-and-visualizing-densenets-7f688092391a" target="_blank">DenseNet-121</a> architecture<h2>8. 后处理</h2>[1] - Apply <a class="no-external-link" href="https://towardsdatascience.com/test-time-augmentation-tta-and-how-to-perform-it-with-keras-4ac19b67fb4d" target="_blank">test time augmentation </a>— presenting an image to a model several times with different random transformations and average the predictions you get[2] - Equalize test prediction <a class="no-external-link" href="https://machinelearningmastery.com/probability-metrics-for-imbalanced-classification/" target="_blank">probabilities</a> instead of only using predicted classes[3] - Apply <a class="no-external-link" href="https://machinelearningmastery.com/arithmetic-geometric-and-harmonic-means-for-machine-learning/" target="_blank">geometric mean</a> to the <a class="no-external-link" href="https://medium.com/@flawnsontong1/what-is-geometric-deep-learning-b2adb662d91d" target="_blank">predictions</a>[4] - <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40126" target="_blank">Overlap tiles during inferencing so that each edge pixel</a> is covered at least thrice because UNET tends to have bad predictions around edge areas.[5] - <a class="no-external-link" href="https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/discussion/70632" target="_blank">Non-maximum suppression</a> and bounding box shrinkage[6] - <a class="no-external-link" href="https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741" target="_blank">Watershed post processing</a> to detach objects in instance segmentation problems.

1. 使用外在数据

2. 数据探索和洞察

3. 预处理

4. 数据增强

5. 模型

5.1. 网络结构

5.2. 硬件平台

5.3. 损失函数

5.4. 训练技巧

6. 评测和交叉验证

7. 集成方法

8. 后处理

发表评论 取消回复 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

从39个Kaggle比赛中总结的语义分割建议和技巧[译]

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款