RLE格式分割标注文件表示

RLE，Run-Length Encoding，变动长度编码算法，是一种对于二值图像的编码方法，以不同码字来表示连续的黑、白像素数. RLE 是计算连续出现的资料长度再进行压缩，是一种简单的非破坏性资料压缩法，且压缩和解压缩都非常快.

很多分割数据集为了节省空间，标注文件采用了 RLE 格式，比如 COCO 等. 但在分割类模型学习与训练中，往往采用的是 png 格式的数据作为标注.

这里简单汇总下 RLE 和二值 Mask 的处理方式.

## 1.  RLE 说明

> [pycocotools/mask.py](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py)

COCO 关于 RLE 的描述如下：

```protobuf
RLE is a simple yet efficient format for storing binary masks. 
RLE first divides a vector (or vectorized image) into a series of piecewise constant regions and then for each piece simply stores the length of that piece. 
For example, given M=[0 0 1 1 1 0 1] the RLE counts would be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1] (note that the odd counts are always the numbers of zeros). 
Instead of storing the counts directly, additional compression is achieved with a variable bitrate representation based on a common scheme called LEB128.

Compression is greatest given large piecewise constant regions. Specifically, the size of the RLE is proportional to the number of boundaries in M (or for an image the number of boundaries in the y direction). 
Assuming fairly simple shapes, the RLE representation is O(sqrt(n)) where n is number of pixels in the object. Hence space usage is substantially lower, especially for large simple objects (large n).

Many common operations on masks can be computed directly using the RLE (without need for decoding). This includes computations such as area, union, intersection, etc. All of these operations are linear in the size of the RLE, in other words they are O(sqrt(n)) where n is the area of the object. Computing these operations on the original mask is O(n).

Thus, using the RLE can result in substantial computational savings.
```

RLE 是一种简单有效的二值 masks 存储格式.

RLE 首先将向量(或向量化的图片)分片，得到一系列分片的常量区域；然后，每一个分片只简单的保存其长度.

例如，对于向量 M=[0 0 1 1 1 0 1]，RLE 计算结果为 [2 3 1 1]；对于向量 M=[1 1 1 1 1 1 0]，RLE 计算结果为  [0 6 1].(注：索引从零开始).

相比于直接保存计数结果，采用了额外的基于 LEB128 的通用方案的可变比特率表示来实现压缩的.

给定大片常量区域，压缩效果是最佳的. 具体的，RLE 的大小正比于在 M 中边界(`boundary`)的数量(对于图片，则正比于在 y 方向的边界数量.)

假设对于相对简单的形状，RLE 表示的复杂度是 O(sqrt(n))，其中，n 表示目标的像素数. 因此，空间占用是相对较低的，尤其是对大面积的简单目标主体( n 比较大).

采用 RLE 可以直接实现很多计算(不需要反解码)，比如，面积、并集、交集等等. 这些计算都是关于 RLE 尺寸呈线性关系的. 换句话说，计算复杂度是 O(sqrt(n)) 的，其中 n 是目标主体的面积. 相对于原始 mask，这些计算的复杂度是 O(n) 的.

因此，RLE 是大量节省计算的.

## 2. RLE 与 PNG 转换

### 2.1. PNG2RLE

```python
#!--*-- coding: utf- --*--
import numpy as np

def rle_encode(binary_mask):
    '''
    binary_mask: numpy array, 1 - mask, 0 - background
    Returns run length as string formated
    '''
    pixels = binary_mask.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)
```

### 2.2. RLE2PNG

```python
#!--*-- coding: utf- --*--
import numpy as np

def rle_decode(mask_rle, shape):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height,width) of array to return
    Returns numpy array, 1 - mask, 0 - background
    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    binary_mask = np.zeros(shape[0] * shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        binary_mask[lo:hi] = 1
    return binary_mask.reshape(shape)
```

### 2.3. 示例

```python
'''
RLE: Run-Length Encode
'''
from PIL import Image
import numpy as np

def __main__():
    maskfile = '/path/to/test.png'
    mask = np.array(Image.open(maskfile))
    binary_mask = mask.copy()
    binary_mask[binary_mask <= 127] = 0
    binary_mask[binary_mask > 127] = 1

# encode
    rle_mask = rle_encode(binary_mask)
    
    # decode
    binary_mask_decode = self.rle_decode(rle_mask, binary_mask.shape[:2])
```

完整代码如下：

```python
'''
RLE: Run-Length Encode
'''
#!--*-- coding: utf- --*--
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# M1:
class general_rle(object):
    '''
    ref.: https://www.kaggle.com/stainsby/fast-tested-rle
    '''
    def __init__(self):
        pass

def rle_encode(self, binary_mask):
        pixels = binary_mask.flatten()
        # We avoid issues with '1' at the start or end (at the corners of
        # the original image) by setting those pixels to '0' explicitly.
        # We do not expect these to be non-zero for an accurate mask,
        # so this should not harm the score.
        pixels[0] = 0
        pixels[-1] = 0
        runs = np.where(pixels[1:] != pixels[:-1])[0] + 2
        runs[1::2] = runs[1::2] - runs[:-1:2]
        return runs

def rle_to_string(self, runs):
        return ' '.join(str(x) for x in runs)

def check(self):
        test_mask = np.asarray([[0, 0, 0, 0],
                                [0, 0, 1, 1],
                                [0, 0, 1, 1],
                                [0, 0, 0, 0]])
        assert rle_to_string(rle_encode(test_mask)) == '7 2 11 2'

# M2:
class binary_mask_rle(object):
    '''
    ref.: https://www.kaggle.com/paulorzp/run-length-encode-and-decode
    '''
    def __init__(self):
        pass

def rle_encode(self, binary_mask):
        '''
        binary_mask: numpy array, 1 - mask, 0 - background
        Returns run length as string formated
        '''
        pixels = binary_mask.flatten()
        pixels = np.concatenate([[0], pixels, [0]])
        runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
        runs[1::2] -= runs[::2]
        return ' '.join(str(x) for x in runs)

def rle_decode(self, mask_rle, shape):
        '''
        mask_rle: run-length as string formated (start length)
        shape: (height,width) of array to return
        Returns numpy array, 1 - mask, 0 - background
        '''
        s = mask_rle.split()
        starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
        starts -= 1
        ends = starts + lengths
        binary_mask = np.zeros(shape[0] * shape[1], dtype=np.uint8)
        for lo, hi in zip(starts, ends):
            binary_mask[lo:hi] = 1
        return binary_mask.reshape(shape)

def check(self):
        maskfile = '/path/to/test.png'
        mask = np.array(Image.open(maskfile))
        binary_mask = mask.copy()
        binary_mask[binary_mask <= 127] = 0
        binary_mask[binary_mask > 127] = 1

# encode
        rle_mask = self.rle_encode(binary_mask)

# decode
        binary_mask2 = self.rle_decode(rle_mask, binary_mask.shape[:2])

#
        assert binary_mask2.shape == binary_mask.shape
```

## 3. 参考

[1] - [RLE压缩算法详解](http://data.biancheng.net/view/152.html)

[2] - [RLE格式标注文件转为PNG格式（Run Length Encode）](https://blog.csdn.net/wangdongwei0/article/details/83820869)

很多分割数据集为了节省空间，标注文件采用了 RLE 格式，比如 COCO 等. 但在分割类模型学习与训练中，往往采用的是 png 格式的数据作为标注.

这里简单汇总下 RLE 和二值 Mask 的处理方式.

1. RLE 说明

pycocotools/mask.py

COCO 关于 RLE 的描述如下：

RLE is a simple yet efficient format for storing binary masks. 
RLE first divides a vector (or vectorized image) into a series of piecewise constant regions and then for each piece simply stores the length of that piece. 
For example, given M=[0 0 1 1 1 0 1] the RLE counts would be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1] (note that the odd counts are always the numbers of zeros). 
Instead of storing the counts directly, additional compression is achieved with a variable bitrate representation based on a common scheme called LEB128.

Compression is greatest given large piecewise constant regions. Specifically, the size of the RLE is proportional to the number of boundaries in M (or for an image the number of boundaries in the y direction). 
Assuming fairly simple shapes, the RLE representation is O(sqrt(n)) where n is number of pixels in the object. Hence space usage is substantially lower, especially for large simple objects (large n).

Many common operations on masks can be computed directly using the RLE (without need for decoding). This includes computations such as area, union, intersection, etc. All of these operations are linear in the size of the RLE, in other words they are O(sqrt(n)) where n is the area of the object. Computing these operations on the original mask is O(n).

Thus, using the RLE can result in substantial computational savings.

RLE 是一种简单有效的二值 masks 存储格式.

RLE 首先将向量(或向量化的图片)分片，得到一系列分片的常量区域；然后，每一个分片只简单的保存其长度.

例如，对于向量 M=[0 0 1 1 1 0 1]，RLE 计算结果为 [2 3 1 1]；对于向量 M=[1 1 1 1 1 1 0]，RLE 计算结果为 [0 6 1].(注：索引从零开始).

相比于直接保存计数结果，采用了额外的基于 LEB128 的通用方案的可变比特率表示来实现压缩的.

给定大片常量区域，压缩效果是最佳的. 具体的，RLE 的大小正比于在 M 中边界(boundary)的数量(对于图片，则正比于在 y 方向的边界数量.)

因此，RLE 是大量节省计算的.

2. RLE 与 PNG 转换

2.1. PNG2RLE

#!--*-- coding: utf- --*--
import numpy as np

def rle_encode(binary_mask):
    '''
    binary_mask: numpy array, 1 - mask, 0 - background
    Returns run length as string formated
    '''
    pixels = binary_mask.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

2.2. RLE2PNG

#!--*-- coding: utf- --*--
import numpy as np

def rle_decode(mask_rle, shape):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height,width) of array to return
    Returns numpy array, 1 - mask, 0 - background
    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    binary_mask = np.zeros(shape[0] * shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        binary_mask[lo:hi] = 1
    return binary_mask.reshape(shape)

2.3. 示例

'''
RLE: Run-Length Encode
'''
from PIL import Image
import numpy as np 

def __main__():
    maskfile = '/path/to/test.png'
    mask = np.array(Image.open(maskfile))
    binary_mask = mask.copy()
    binary_mask[binary_mask <= 127] = 0
    binary_mask[binary_mask > 127] = 1

    # encode
    rle_mask = rle_encode(binary_mask)
    
    # decode
    binary_mask_decode = self.rle_decode(rle_mask, binary_mask.shape[:2])

完整代码如下：

'''
RLE: Run-Length Encode
'''
#!--*-- coding: utf- --*--
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt


# M1:
class general_rle(object):
    '''
    ref.: https://www.kaggle.com/stainsby/fast-tested-rle
    '''
    def __init__(self):
        pass


    def rle_encode(self, binary_mask):
        pixels = binary_mask.flatten()
        # We avoid issues with '1' at the start or end (at the corners of
        # the original image) by setting those pixels to '0' explicitly.
        # We do not expect these to be non-zero for an accurate mask,
        # so this should not harm the score.
        pixels[0] = 0
        pixels[-1] = 0
        runs = np.where(pixels[1:] != pixels[:-1])[0] + 2
        runs[1::2] = runs[1::2] - runs[:-1:2]
        return runs


    def rle_to_string(self, runs):
        return ' '.join(str(x) for x in runs)


    def check(self):
        test_mask = np.asarray([[0, 0, 0, 0],
                                [0, 0, 1, 1],
                                [0, 0, 1, 1],
                                [0, 0, 0, 0]])
        assert rle_to_string(rle_encode(test_mask)) == '7 2 11 2'


# M2:
class binary_mask_rle(object):
    '''
    ref.: https://www.kaggle.com/paulorzp/run-length-encode-and-decode
    '''
    def __init__(self):
        pass

    def rle_encode(self, binary_mask):
        '''
        binary_mask: numpy array, 1 - mask, 0 - background
        Returns run length as string formated
        '''
        pixels = binary_mask.flatten()
        pixels = np.concatenate([[0], pixels, [0]])
        runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
        runs[1::2] -= runs[::2]
        return ' '.join(str(x) for x in runs)


    def rle_decode(self, mask_rle, shape):
        '''
        mask_rle: run-length as string formated (start length)
        shape: (height,width) of array to return
        Returns numpy array, 1 - mask, 0 - background
        '''
        s = mask_rle.split()
        starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
        starts -= 1
        ends = starts + lengths
        binary_mask = np.zeros(shape[0] * shape[1], dtype=np.uint8)
        for lo, hi in zip(starts, ends):
            binary_mask[lo:hi] = 1
        return binary_mask.reshape(shape)

    def check(self):
        maskfile = '/path/to/test.png'
        mask = np.array(Image.open(maskfile))
        binary_mask = mask.copy()
        binary_mask[binary_mask <= 127] = 0
        binary_mask[binary_mask > 127] = 1

        # encode
        rle_mask = self.rle_encode(binary_mask)

        # decode
        binary_mask2 = self.rle_decode(rle_mask, binary_mask.shape[:2])

        #
        assert binary_mask2.shape == binary_mask.shape

3. 参考

[1] - RLE压缩算法详解

[2] - RLE格式标注文件转为PNG格式（Run Length Encode）

Last modification：November 19, 2020

If you think my article is useful to you, please feel free to appreciate