In the entry concerning deep learning we have shown that a deep neural network can be thought of as a neural feature extractor (the first part of the network that learns to preprocess data and extract useful features from it) and a shallow model (the remaining layers that produce the prediction).

A deep neural network decomposed into a deep feature extractor and a shallow model.

The features produced by the deep extractor are sometimes referred to as an embedding. More broadly, an embedding is a mapping of points from some original space to a different space, with properties better suited to some purpose. There are many other ways to produce embeddings apart from training a deep model for classification or regression, but what they have in common is that the neural network is being trained on some surrogate task that is expected to produce the right kind of features.

To give a more graphical idea of what an embedding looks like and how it can transform a space, consider a dataset with images of several different kinds of fruits: a few samples are presented in the following figure:

Samples from a tiny dataset of fruit images.

If we reduce dimensionality of this data (using a method called UMAP) to 2D so that we can plot the points, we are going to see that the different kinds of fruits are very intermixed. It would be very difficult to separate them from each other in this space.

Images of fruits reduced from the pixel space to 2D using UMAP.

In a further figure, we can see what an embedding of points to a different, more convenient space might look like. If we take a deep feature extractor from a deep neural classifier trained on a large dataset of images, we apply it to the images and then plot the results in 2D using UMAP, we get the following embedding:

Images of fruits reduced from the pixel space to 2D using UMAP.

As we can see, in the new space, the same kinds of fruits are clustered together. This makes them a lot easier to classify (which is why the deep network produces this kind of representation), but the features could also be used in conjunction with other approaches. E.g. face embeddings are used in face recognition, where two facial images can be compared by computing their embeddings and then measuring the distance between them.