Abstract
We propose a novel semantic segmentation algorithm by learning a deep deconvolution network. We learn the net- work on top of the convolutional layers adopted from VGG 16-layer net. The deconvolution network is composed of deconvolution and unpooling layers, which identify pixel- wise class labels and predict segmentation masks. We ap- ply the trained network to each proposal in an input im- age, and construct the final semantic segmentation map by combining the results from all proposals in a simple man- ner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks by integrating deep deconvolution network and proposal-wise prediction; our segmentation method typically identifies de- tailed structures and handles objects in multiple scales nat- urally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset, and we achieve the best ac- curacy (72.5%) among the methods trained without using Microsoft COCO dataset through ensemble with the fully convolutional network.
우리는 깊은 deconvolution network로 학습이 가능한 semantic segmentation 알고리즘을 제안한다.
VGG16 network를 기반으로 실험하였다.
Deconvolution network는 deconvolution과 unspooling layers로 구성되어 있으며, pixel-wise 클래스들을 식별하고 구분된 마스크를 예측한다.
우리는 각 이미지에 훈련된 네트워크를 사용하며, 각각의 모델로부터 나온 결과를 조합하여 결과를 구성한다.
제안 알고리즘은 FCN의 한계점을 보완한다.
우리의 seg방법은 좀더 디테일하게 구조를 식별하고 다양한 크기의 객체를 자연스럽게 다룰 수 있다.
PASCAL VOC 2012 dataset으로 이미 성능을 증명했으며, COCO 데이터셋을 사용하지 않고 FCN 앙상블 모델을 통해 best acc를 달성하였다.
Reference
Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision (pp. 1520-1528).