The task of audio recognition involves an acoustic signal and typically some kind of classification. As an example we can mention speech recognition: either isolated, where the recordings are speech units, e.g. words – and they are classified as a whole –, or continuous, where the task is to transcribe multiple words or sentences. The field is, of course, very broad and even involves tasks such as emotion recognition from speech, music and such.
Working with acoustic signals is generally very challenging – just like working with vision and any other kind of unstructured data. The task has traditionally required very sophisticated preprocessing. Nowadays tasks involving audio recognition are typically approached using deep learning techniques, which can learn how to preprocess the data automatically.