Github 项目 - Kaggle 车辆边界识别之 UNet

博主： AIHGF
发布时间：2018 年 08 月 08 日
10116 次浏览
7 条评论
10373字数
分类：语义分割 Github项目

Kaggle Carvana Image Masking Challenge

1. UNet

Github 项目 - Pytorch-UNet

Pytorch-UNet - U-Net 的 PyTorch 实现，用于二值汽车图像语义分割，包括 dense CRF 后处理.

Pytorch-UNet 用于 Carvana Image Masking Challenge 高分辨率图像的分割. 该项目只输出一个前景目标类，但可以容易地扩展到多前景目标分割任务.

Pytorch-UNet 提供的训练模型 - MODEL.pth，采用 5000 张图片从头开始训练(未进行数据增强)，在 100k 测试图片上得到的 dice coefficient 为 0.988423. 虽然结构并不够好，但可以采用更多数据增强，fine-tuning，CRF 后处理，以及对 masks 的边缘添加更多权重等方式，提升分割精度.

2. pydensecrf 库

[[Github - PyDenseCRF]](https://github.com/lucasb-eyer/pydensecrf)

[1] - 安装：

sudo pip install pydensecrf

[2] - 使用:

import numpy as np
import pydensecrf.densecrf as dcrf
d = dcrf.DenseCRF2D(640, 480, 5)  # width, height, nlabels

3. Pytorch-UNet

Pytorch-UNet

[1] - 训练的 PyTorch 模型 - MODEL.pth

[2] - U-Net 网络定义 - unet_parts.py

# sub-parts of the U-Net model
import torch
import torch.nn as nn
import torch.nn.functional as F


class double_conv(nn.Module):
    '''(conv => BN => ReLU) * 2'''
    def __init__(self, in_ch, out_ch):
        super(double_conv, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x = self.conv(x)
        return x


class inconv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(inconv, self).__init__()
        self.conv = double_conv(in_ch, out_ch)

    def forward(self, x):
        x = self.conv(x)
        return x


class down(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(down, self).__init__()
        self.mpconv = nn.Sequential(
            nn.MaxPool2d(2),
            double_conv(in_ch, out_ch)
        )

    def forward(self, x):
        x = self.mpconv(x)
        return x


class up(nn.Module):
    def __init__(self, in_ch, out_ch, bilinear=True):
        super(up, self).__init__()

        #  would be a nice idea if the upsampling could be learned too,
        #  but my machine do not have enough memory to handle all those weights
        if bilinear:
            self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
        else:
            self.up = nn.ConvTranspose2d(in_ch//2, in_ch//2, 2, stride=2)

        self.conv = double_conv(in_ch, out_ch)

    def forward(self, x1, x2):
        x1 = self.up(x1)
        diffX = x1.size()[2] - x2.size()[2]
        diffY = x1.size()[3] - x2.size()[3]
        x2 = F.pad(x2, (diffX // 2, int(diffX / 2),
                        diffY // 2, int(diffY / 2)))
        x = torch.cat([x2, x1], dim=1)
        x = self.conv(x)
        return x


class outconv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(outconv, self).__init__()
        self.conv = nn.Conv2d(in_ch, out_ch, 1)

    def forward(self, x):
        x = self.conv(x)
        return x

[3] - U-Net 网络定义 - unet_model.py

# full assembly of the sub-parts to form the complete net

from .unet_parts import *

class UNet(nn.Module):
    def __init__(self, n_channels, n_classes):
        super(UNet, self).__init__()
        self.inc = inconv(n_channels, 64)
        self.down1 = down(64, 128)
        self.down2 = down(128, 256)
        self.down3 = down(256, 512)
        self.down4 = down(512, 512)
        self.up1 = up(1024, 256)
        self.up2 = up(512, 128)
        self.up3 = up(256, 64)
        self.up4 = up(128, 64)
        self.outc = outconv(64, n_classes)

    def forward(self, x):
        x1 = self.inc(x)
        x2 = self.down1(x1)
        x3 = self.down2(x2)
        x4 = self.down3(x3)
        x5 = self.down4(x4)
        x = self.up1(x5, x4)
        x = self.up2(x, x3)
        x = self.up3(x, x2)
        x = self.up4(x, x1)
        x = self.outc(x)
        return x

3.1. 测试 demo

import argparse
import os

import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn.functional as F

from PIL import Image

from unet import UNet
from utils import resize_and_crop, normalize, split_img_into_squares, hwc_to_chw, merge_masks, dense_crf

from torchvision import transforms

def predict_img(net,
                full_img,
                scale_factor=0.5,
                out_threshold=0.5,
                use_dense_crf=True,
                use_gpu=False):
    img_height = full_img.size[1]
    img_width = full_img.size[0]

    img = resize_and_crop(full_img, scale=scale_factor)
    img = normalize(img)

    left_square, right_square = split_img_into_squares(img)

    left_square = hwc_to_chw(left_square)
    right_square = hwc_to_chw(right_square)

    X_left = torch.from_numpy(left_square).unsqueeze(0)
    X_right = torch.from_numpy(right_square).unsqueeze(0)

    if use_gpu:
        X_left = X_left.cuda()
        X_right = X_right.cuda()

    with torch.no_grad():
        output_left = net(X_left)
        output_right = net(X_right)

        left_probs = F.sigmoid(output_left).squeeze(0)
        right_probs = F.sigmoid(output_right).squeeze(0)

        tf = transforms.Compose([transforms.ToPILImage(),
                                 transforms.Resize(img_height),
                                 transforms.ToTensor() ])

        left_probs = tf(left_probs.cpu())
        right_probs = tf(right_probs.cpu())

        left_mask_np = left_probs.squeeze().cpu().numpy()
        right_mask_np = right_probs.squeeze().cpu().numpy()

    full_mask = merge_masks(left_mask_np, right_mask_np, img_width)

    if use_dense_crf:
        full_mask = dense_crf(np.array(full_img).astype(np.uint8), full_mask)

    return full_mask > out_threshold


def mask_to_image(mask):
    return Image.fromarray((mask * 255).astype(np.uint8))


def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', '-m', default='MODEL.pth',
                        metavar='FILE',
                        help="Specify the file in which is stored the model"
                             " (default : 'MODEL.pth')")
    # parser.add_argument('--input', '-i', metavar='INPUT', nargs='+',
    #                     help='filenames of input images', required=True)

    parser.add_argument('--output', '-o', metavar='INPUT', nargs='+',
                        help='filenames of ouput images')
    parser.add_argument('--cpu', '-c', action='store_true',
                        help="Do not use the cuda version of the net",
                        default=False)
    parser.add_argument('--viz', '-v', action='store_true',
                        help="Visualize the images as they are processed",
                        default=False)
    parser.add_argument('--no-save', '-n', action='store_true',
                        help="Do not save the output masks",
                        default=False)
    parser.add_argument('--no-crf', '-r', action='store_true',
                        help="Do not use dense CRF postprocessing",
                        default=False)
    parser.add_argument('--mask-threshold', '-t', type=float,
                        help="Minimum probability value to consider a mask pixel white",
                        default=0.5)
    parser.add_argument('--scale', '-s', type=float,
                        help="Scale factor for the input images",
                        default=0.5)

    return parser.parse_args()


if name == "__main__":
    args = get_args()

    args.cpu = 0
    args.no_crf = True
    args.model = './MODEL.pth'
    net = UNet(n_channels=3, n_classes=1)
    print("Loading model {}".format(args.model))

    if not args.cpu:
        print("Using CUDA version of the net, prepare your GPU !")
        net.cuda()
        net.load_state_dict(torch.load(args.model))
    else:
        net.cpu()
        net.load_state_dict(torch.load(args.model, map_location='cpu'))
        print("Using CPU version of the net, this may be very slow")

    print("Model loaded !")

    in_files = os.listdir('/path/to/testcars')

    for i, fn in enumerate(in_files):
        fn = os.path.join('/path/to/testcars', fn)
        print("nPredicting image {} ...".format(fn))

        img = Image.open(fn)
        if img.size[0] < img.size[1]:
            print("Error: image height larger than the width")
        else:
            mask = predict_img(net=net,
                               full_img=img,
                               scale_factor=args.scale,
                               out_threshold=args.mask_threshold,
                               use_dense_crf=not args.no_crf,
                               use_gpu=not args.cpu)

            print("Visualizing results for image {}, close to continue ...".format(fn))
            fig = plt.figure()
            a = fig.add_subplot(1, 2, 1)
            a.set_title('Input image')
            plt.imshow(img)

            b = fig.add_subplot(1, 2, 2)
            b.set_title('Output mask')
            plt.imshow(mask)
            plt.show()

3.2. 未进行 dense crf 后处理

未进行 dense crf 后处理：

dense crf 后处理：

最后修改：2020 年 12 月 08 日

如果觉得我的文章对你有用，请随意赞赏

7 条评论

Jindge
July 12th, 2021 at 02:45 pm

作者您好，可以麻烦您发一份 MODEL.pth给我吗，文章里的链接失效了，我的邮箱是1666371476@qq.com，谢谢！

回复
1. AIHGF
  July 12th, 2021 at 07:46 pm
  
  @Jindge
  
  参考：https://github.com/milesial/Pytorch-UNet/#pretrained-model
  
  回复
  1. Jindge
    July 12th, 2021 at 08:00 pm
    
    @AIHGF
    
    感谢谢作者
    
    回复
huangtong
March 25th, 2021 at 03:03 pm

请问这一步后，返回的true和false：return full_mask > out_threshold，在下一步转Image时会报错，请问怎么解决呀

回复
1. AIHGF
  March 26th, 2021 at 11:41 am
  
  @huangtong
  
  报错提示是什么呢
  
  回复
zheng
December 11th, 2020 at 02:47 pm

能发一份MODEL.pth给我吗，谢谢，我的邮箱823451255@qq.com

回复
Top丶M
December 8th, 2020 at 11:54 am

您好，可以麻烦您发一份 MODEL.pth给我吗，文章里的链接失效了，我的邮箱是595644129@qq.com，谢谢！

回复

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

评论 *

私密评论

名称 *

🎲

邮箱 *

地址

Github 项目 - Kaggle 车辆边界识别之 UNet

AIHGF • 2018 年 08 月 08 日

<blockquote><a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge" target="_blank">Kaggle Carvana Image Masking Challenge</a></blockquote><h2>1. UNet</h2><blockquote><a class="no-external-link" href="https://github.com/milesial/Pytorch-UNet" target="_blank">Github 项目 - Pytorch-UNet</a></blockquote><a class="no-external-link" href="https://github.com/milesial/Pytorch-UNet" target="_blank">Pytorch-UNet</a> - <a class="no-external-link" href="https://arxiv.org/pdf/1505.04597.pdf" target="_blank">U-Net</a> 的 PyTorch 实现，用于二值汽车图像语义分割，包括 dense CRF 后处理.<a class="no-external-link" href="https://github.com/milesial/Pytorch-UNet" target="_blank">Pytorch-UNet</a> 用于 <a class="no-external-link" href="https://www.kaggle.com/c/carvana-image-masking-challenge" target="_blank">Carvana Image Masking Challenge</a> 高分辨率图像的分割. 该项目只输出一个前景目标类，但可以容易地扩展到多前景目标分割任务.<a class="no-external-link" href="https://github.com/milesial/Pytorch-UNet" target="_blank">Pytorch-UNet</a> 提供的训练模型 - <a class="no-external-link" href="https://github.com/milesial/Pytorch-UNet/blob/master/MODEL.pth" target="_blank">MODEL.pth</a>，采用 5000 张图片从头开始训练(未进行数据增强)，在 100k 测试图片上得到的 <a class="no-external-link" href="https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient" target="_blank">dice coefficient</a> 为 0.988423. 虽然结构并不够好，但可以采用更多数据增强，fine-tuning，CRF 后处理，以及对 masks 的边缘添加更多权重等方式，提升分割精度.<img src="https://www.aiuai.cn/uploads/sina/5ce8dfb97dc93.jpg" alt="" title="" style=""><h2>2. pydensecrf 库</h2><blockquote>[[Github - PyDenseCRF]](<a class="no-external-link" href="https://github.com/lucasb-eyer/pydensecrf" target="_blank">https://github.com/lucasb-eyer/pydensecrf</a>)</blockquote>[1] - 安装：<pre><code class="lang-shell">sudo pip install pydensecrf</code></pre>[2] - 使用:<pre><code class="lang-python">import numpy as np
import pydensecrf.densecrf as dcrf
d = dcrf.DenseCRF2D(640, 480, 5) # width, height, nlabels</code></pre><h2>3. Pytorch-UNet</h2><blockquote><a class="no-external-link" href="https://github.com/milesial/Pytorch-UNet" target="_blank">Pytorch-UNet</a></blockquote>[1] - 训练的 PyTorch 模型 - <a class="no-external-link" href="https://github.com/milesial/Pytorch-UNet/blob/master/MODEL.pth" target="_blank">MODEL.pth</a>[2] - U-Net 网络定义 - <a class="no-external-link" href="https://github.com/milesial/Pytorch-UNet/blob/master/unet/unet_parts.py" target="_blank">unet_parts.py</a><pre><code class="lang-python"># sub-parts of the U-Net model
import torch
import torch.nn as nn
import torch.nn.functional as F

class double_conv(nn.Module):
    '''(conv =&gt; BN =&gt; ReLU) * 2'''
    def __init__(self, in_ch, out_ch):
        super(double_conv, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )

def forward(self, x):
        x = self.conv(x)
        return x

class inconv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(inconv, self).__init__()
        self.conv = double_conv(in_ch, out_ch)

def forward(self, x):
        x = self.conv(x)
        return x

class down(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(down, self).__init__()
        self.mpconv = nn.Sequential(
            nn.MaxPool2d(2),
            double_conv(in_ch, out_ch)
        )

def forward(self, x):
        x = self.mpconv(x)
        return x

class up(nn.Module):
    def __init__(self, in_ch, out_ch, bilinear=True):
        super(up, self).__init__()

#  would be a nice idea if the upsampling could be learned too,
        #  but my machine do not have enough memory to handle all those weights
        if bilinear:
            self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
        else:
            self.up = nn.ConvTranspose2d(in_ch//2, in_ch//2, 2, stride=2)

self.conv = double_conv(in_ch, out_ch)

def forward(self, x1, x2):
        x1 = self.up(x1)
        diffX = x1.size()[2] - x2.size()[2]
        diffY = x1.size()[3] - x2.size()[3]
        x2 = F.pad(x2, (diffX // 2, int(diffX / 2),
                        diffY // 2, int(diffY / 2)))
        x = torch.cat([x2, x1], dim=1)
        x = self.conv(x)
        return x

class outconv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(outconv, self).__init__()
        self.conv = nn.Conv2d(in_ch, out_ch, 1)

def forward(self, x):
 x = self.conv(x)
 return x</code></pre>[3] - U-Net 网络定义 - <a class="no-external-link" href="https://github.com/milesial/Pytorch-UNet/blob/master/unet/unet_model.py" target="_blank">unet_model.py</a><pre><code class="lang-python"># full assembly of the sub-parts to form the complete net

from .unet_parts import *

class UNet(nn.Module):
    def __init__(self, n_channels, n_classes):
        super(UNet, self).__init__()
        self.inc = inconv(n_channels, 64)
        self.down1 = down(64, 128)
        self.down2 = down(128, 256)
        self.down3 = down(256, 512)
        self.down4 = down(512, 512)
        self.up1 = up(1024, 256)
        self.up2 = up(512, 128)
        self.up3 = up(256, 64)
        self.up4 = up(128, 64)
        self.outc = outconv(64, n_classes)

def forward(self, x):
 x1 = self.inc(x)
 x2 = self.down1(x1)
 x3 = self.down2(x2)
 x4 = self.down3(x3)
 x5 = self.down4(x4)
 x = self.up1(x5, x4)
 x = self.up2(x, x3)
 x = self.up3(x, x2)
 x = self.up4(x, x1)
 x = self.outc(x)
 return x</code></pre><h3>3.1. 测试 demo</h3><pre><code class="lang-python">import argparse
import os

import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn.functional as F

from PIL import Image

from unet import UNet
from utils import resize_and_crop, normalize, split_img_into_squares, hwc_to_chw, merge_masks, dense_crf

from torchvision import transforms

def predict_img(net,
                full_img,
                scale_factor=0.5,
                out_threshold=0.5,
                use_dense_crf=True,
                use_gpu=False):
    img_height = full_img.size[1]
    img_width = full_img.size[0]

img = resize_and_crop(full_img, scale=scale_factor)
    img = normalize(img)

left_square, right_square = split_img_into_squares(img)

left_square = hwc_to_chw(left_square)
    right_square = hwc_to_chw(right_square)

X_left = torch.from_numpy(left_square).unsqueeze(0)
    X_right = torch.from_numpy(right_square).unsqueeze(0)

if use_gpu:
        X_left = X_left.cuda()
        X_right = X_right.cuda()

with torch.no_grad():
        output_left = net(X_left)
        output_right = net(X_right)

left_probs = F.sigmoid(output_left).squeeze(0)
        right_probs = F.sigmoid(output_right).squeeze(0)

tf = transforms.Compose([transforms.ToPILImage(),
                                 transforms.Resize(img_height),
                                 transforms.ToTensor() ])

left_probs = tf(left_probs.cpu())
        right_probs = tf(right_probs.cpu())

left_mask_np = left_probs.squeeze().cpu().numpy()
        right_mask_np = right_probs.squeeze().cpu().numpy()

full_mask = merge_masks(left_mask_np, right_mask_np, img_width)

if use_dense_crf:
        full_mask = dense_crf(np.array(full_img).astype(np.uint8), full_mask)

return full_mask &gt; out_threshold

def mask_to_image(mask):
    return Image.fromarray((mask * 255).astype(np.uint8))

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', '-m', default='MODEL.pth',
                        metavar='FILE',
                        help=&quot;Specify the file in which is stored the model&quot;
                             &quot; (default : 'MODEL.pth')&quot;)
    # parser.add_argument('--input', '-i', metavar='INPUT', nargs='+',
    #                     help='filenames of input images', required=True)

parser.add_argument('--output', '-o', metavar='INPUT', nargs='+',
                        help='filenames of ouput images')
    parser.add_argument('--cpu', '-c', action='store_true',
                        help=&quot;Do not use the cuda version of the net&quot;,
                        default=False)
    parser.add_argument('--viz', '-v', action='store_true',
                        help=&quot;Visualize the images as they are processed&quot;,
                        default=False)
    parser.add_argument('--no-save', '-n', action='store_true',
                        help=&quot;Do not save the output masks&quot;,
                        default=False)
    parser.add_argument('--no-crf', '-r', action='store_true',
                        help=&quot;Do not use dense CRF postprocessing&quot;,
                        default=False)
    parser.add_argument('--mask-threshold', '-t', type=float,
                        help=&quot;Minimum probability value to consider a mask pixel white&quot;,
                        default=0.5)
    parser.add_argument('--scale', '-s', type=float,
                        help=&quot;Scale factor for the input images&quot;,
                        default=0.5)

return parser.parse_args()

if name == &quot;__main__&quot;:
    args = get_args()

args.cpu = 0
    args.no_crf = True
    args.model = './MODEL.pth'
    net = UNet(n_channels=3, n_classes=1)
    print(&quot;Loading model {}&quot;.format(args.model))

if not args.cpu:
        print(&quot;Using CUDA version of the net, prepare your GPU !&quot;)
        net.cuda()
        net.load_state_dict(torch.load(args.model))
    else:
        net.cpu()
        net.load_state_dict(torch.load(args.model, map_location='cpu'))
        print(&quot;Using CPU version of the net, this may be very slow&quot;)

print(&quot;Model loaded !&quot;)

in_files = os.listdir('/path/to/testcars')

for i, fn in enumerate(in_files):
        fn = os.path.join('/path/to/testcars', fn)
        print(&quot;nPredicting image {} ...&quot;.format(fn))

img = Image.open(fn)
        if img.size[0] &lt; img.size[1]:
            print(&quot;Error: image height larger than the width&quot;)
        else:
            mask = predict_img(net=net,
                               full_img=img,
                               scale_factor=args.scale,
                               out_threshold=args.mask_threshold,
                               use_dense_crf=not args.no_crf,
                               use_gpu=not args.cpu)

print(&quot;Visualizing results for image {}, close to continue ...&quot;.format(fn))
            fig = plt.figure()
            a = fig.add_subplot(1, 2, 1)
            a.set_title('Input image')
            plt.imshow(img)

b = fig.add_subplot(1, 2, 2)
            b.set_title('Output mask')
            plt.imshow(mask)
            plt.show()

</code></pre><h3>3.2. 未进行 dense crf 后处理</h3>未进行 dense crf 后处理：<img src="https://www.aiuai.cn/uploads/sina/5ce8dfb9e06f6.jpg" alt="" title="" style="">dense crf 后处理： <img src="https://www.aiuai.cn/uploads/sina/5ce8dfba66119.jpg" alt="" title="" style="">

1. UNet

2. pydensecrf 库

3. Pytorch-UNet

3.1. 测试 demo

3.2. 未进行 dense crf 后处理

7 条评论

发表评论 取消回复 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

Github 项目 - Kaggle 车辆边界识别之 UNet

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款