Classification is a task of supervised learning: a classifier is fitted to map its inputs (which can be discrete or continuous) to classes. The classifier observes an instance and decides into which of a predefined number of classes it belongs. Problems of this kind may include problems with well-structured data such as predicting whether a client is likely to default on a loan or not, but also challenging problems with unstructured data such as image or audio recognition, which require classifiers with very sophisticated preprocessing capabilities.

Concerning the number of classes, a classification problem may be:

  • Binary: there are exactly two classes; e.g. a positive and a negative class in a medical test;
  • Multi-class: there are more than two classes.

There are also problems that may require multi-label classification (not to be confused with multi-class classification), where the same instance may belong to more than one class at a time, or even fuzzy classification: there may be a degree to which an instance belongs to each class (which essentially turns the classification problem into a specific class of a regression problem).

Classification represents, along with regression, one of the two most fundamental supervised learning problems so most supervised learning methods are able to do classification, regression or both.