Visual semantic segmentation is a task related to, but distinct from image recognition and visual object detection. Like visual object detection and unlike image recognition, a segmentation method provides information about where an object occurs in the image. However, in comparison to visual object detection, in segmentation, annotations are much more fine-grained: they come at pixel level. A segmentation method does not delineate an object using a bounding box, but rather predicts which pixels correspond to which class of object.

One of the challenges in segmentation is that a segmentation method is, in fact, required to produce an entire image at its output. Also, annotating images for segmentation is much more labour intensive than annotating either for image recognition or for visual object detection. On the other hand, the number of images required to train a segmentation method is typically significantly lower: precisely because the annotations carry more information.

Visual segmentation: an example [cityscapes].

Literature

  1. [cityscapes] CityScapes Dataset: Semantic Understanding of Urban Street Scenes. https://www.cityscapes-dataset.com/examples/